kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
159 stars 81 forks source link

Error during ataqc step "IOError: File [xxx]R1.trim.merged.signal does not exist" #136

Closed Chokaro closed 5 years ago

Chokaro commented 5 years ago

First of all, thanks for the hard work. Deploying your pipeline via docker was rather easy, even for a bioinformatics amateur like me.

Sadly it didnt go all smooth until the end, I am using cromwell and atac.wdl to access your docker container and during the ataqc step I get the following error (I get this error for the R1 files of both PE replicates)

Traceback (most recent call last): File "/software/atac-seq-pipeline/src/encode_ataqc.py", line 355, in ataqc() File "/software/atac-seq-pipeline/src/encode_ataqc.py", line 213, in ataqc ROADMAP_META, OUTPUT_PREFIX) File "/software/atac-seq-pipeline/src/run_ataqc.py", line 948, in compare_to_roadmap sample_data = pd.read_table(out_file, header=None) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in init self._make_engine(self.engine) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 402, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 718, in pandas._libs.parsers.TextReader._setup_parser_source IOError: File LRSC1_CD34_50k_R1.trim.merged.signal does not exist

My Input .json looks like this, the fastq.gz files are stored locally on a different hdd harddrive:

{ "atac.pipeline_type" : "atac", "atac.genome_tsv" : "/media/chokaro/2TB_Storage_2/genome/local/hg19_local.tsv", "atac.fastqs" : [ [ ["/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_10k_R1.fastq.gz", "/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_10k_R2.fastq.gz"] ], [ ["/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_50k_R1.fastq.gz", "/media/chokaro/2TB_Storage_2/2018_03_24_OmniATAC_CD34/fastq/LRSC1_CD34_50k_R2.fastq.gz"] ] ],

"atac.paired_end" : true,
"atac.multimapping" : 4,

"atac.trim_adapter.auto_detect_adapter" : true,

"atac.bowtie2.cpu" : 6,
"atac.bowtie2.mem_mb" : 16000,
"atac.bowtie2.time_hr" : 36,

"atac.filter.cpu" : 2,
"atac.filter.mem_mb" : 12000,
"atac.filter.time_hr" : 23,

"atac.macs2_mem_mb" : 16000,

"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,

"atac.qc_report.name" : "test1",
"atac.qc_report.desc" : "test1 on CD34 omni ATAC"

}

And finally my OS and system config are the following:

OS: Ubuntu Xenial 16.04 cromwell 34 conda 4.5.11 Docker version 18.06.1-ce, build e68fc7a

Hoping you guys have some advice for this... In any case many thanks in advance!

best Chris

leepc12 commented 5 years ago

Let's discuss this on the new repo. https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/26 This repo has been deprecated.

leepc12 commented 5 years ago

@Chokaro : I need more info for debugging. Please run the following on your working directory (where you ran pipeline cmd line). This will make a tar ball including all text/log/qc files on your output directory. Please send it to me (leepc12@gmail.com or upload it here).

$ find . -type f -name 'stdout' -or -name 'stderr' -or -name 'script' -or \
-name '*.qc' -or -name '*.txt' -or -name '*.log' -or -name '*.png' -or -name '*.pdf' \
| xargs tar -zcvf debug_issue_26.tar.gz

Let's discuss it more at here