open2c / distiller-nf

A modular Hi-C mapping pipeline
MIT License
86 stars 24 forks source link

distiller is failing at bin_zoom_library_pairs with new reference genome #137

Open gibcus opened 5 years ago

gibcus commented 5 years ago

dis.out contents:

N E X T F L O W ~ version 19.01.0 Launching dekkerlab/distiller-nf [mad_edison] - revision: 5f5b40f0c7 [ghpcc] [warm up] executor > lsf [76/2f5f3c] Submitted process > local_truncate_chunk_fastqs (library:ICRF-12min-S2-R2galGal6 run:lane1) [99/257402] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:03) [f0/613fcd] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:01) [80/0ae835] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:05) [c1/0329d9] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:04) [a8/268854] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:08) [a0/efa5f1] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:02) [11/20e124] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:09) [c8/7c0943] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:06) [16/c87ea8] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:07) [1d/040136] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:11) [e5/a8a883] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:12) [e6/7dc0ec] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2galGal6 run:lane1 chunk:10) [17/3f266e] Submitted process > merge_dedup_splitbam (library:ICRF-12min-S2-R2galGal6) [96/a57503] Submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2galGal6 filter:no_filter) [d8/9d8b0c] Submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2galGal6 filter:mapq_30) [e0/acfb4b] Submitted process > merge_stats_libraries_into_groups (library_group:ICRF-12m-R2) [0c/a48dde] Submitted process > merge_stats_libraries_into_groups (library_group:all) [96/a57503] NOTE: Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1) -- Execution is retried (1) [42/b06c9f] Re-submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2galGal6 filter:no_filter) [42/b06c9f] NOTE: Process `bin_zoom_library_pairs (library:ICRF-12min-S2-R2galGal6 filter:no_filter)` terminated with an error exit status (1) -- Execution is retried (2) [56/efd5ad] Re-submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2galGal6 filter:no_filter) ERROR ~ Error executing process > 'bin_zoom_library_pairs (library:ICRF-12min-S2-R2galGal6 filter:no_filter)'

Caused by: Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1)

Command executed:

bgzip -cd -@ 3 ICRF-12min-S2-R2galGal6.galGal6.nodups.pairs.gz | cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 --assembly galGal6 galGal6.reduced.chrom.sizes:10000 - ICRF-12min-S2-R2galGal6.galGal6.no_filter.10000.cool

cooler zoomify --nproc 12 --out ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.mcool --resolutions 1000000,500000,250000,100000,50000,25000,10000 --balance ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool

Command exit status: 1

Command output: (empty)

Command error: INFO:cooler.create:Writing bins INFO:cooler.create:Writing pixels INFO:cooler.create:Writing indexes INFO:cooler.create:Writing info INFO:cooler.create:Done INFO:cooler.create:Writing chunk 8: /tmp/tmp6loc8rar.multi.cool::8 INFO:cooler.create:Creating cooler at "/tmp/tmp6loc8rar.multi.cool::/8" INFO:cooler.create:Writing chroms INFO:cooler.create:Writing bins INFO:cooler.create:Writing pixels INFO:cooler.create:Writing indexes INFO:cooler.create:Writing info INFO:cooler.create:Done INFO:cooler.create:Merging into ICRF-12min-S2-R2galGal6.galGal6.no_filter.10000.cool INFO:cooler.create:Creating cooler at "ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool::/" INFO:cooler.create:Writing chroms INFO:cooler.create:Writing bins INFO:cooler.create:Writing pixels INFO:cooler.reduce:nnzs: [0, 0, 0, 0, 0, 0, 0, 0, 0] INFO:cooler.reduce:current: [0, 0, 0, 0, 0, 0, 0, 0, 0] Traceback (most recent call last): File "/miniconda3/bin/cooler", line 11, in sys.exit(cli()) File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/miniconda3/lib/python3.6/site-packages/cooler/cli/cload.py", line 476, in pairs h5opts=h5opts, File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 670, in create_from_unordered kwargs) File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 565, in create file_path, target, meta.columns, iterable, h5opts, lock) File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 204, in write_pixels for i, chunk in enumerate(iterable): File "/miniconda3/lib/python3.6/site-packages/cooler/reduce.py", line 162, in iter ignore_index=True) File "/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 225, in concat copy=copy, sort=sort) File "/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 259, in init__ raise ValueError('No objects to concatenate') ValueError: No objects to concatenate

Work dir: /nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6/work/56/efd5ad506debb426aa33922a1b1abe

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details WARN: Killing pending tasks (1)


Sender: LSF System lsfadmin@c04b04 Subject: Job 2694432: <~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config> in cluster Exited

Job <~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config> was submitted from host by user in cluster at Thu Mar 28 19:49:37 2019. Job was executed on host(s) <2*c04b04>, in queue , as user in cluster at Thu Mar 28 19:49:37 2019. </home/jg14w> was used as the home directory. </nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6> was used as the working directory. Started at Thu Mar 28 19:49:37 2019. Terminated at Fri Mar 29 03:55:45 2019. Results reported at Fri Mar 29 03:55:45 2019.

Your job looked like:


LSBATCH: User input ~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config

Exited with exit code 1.

Resource usage summary:

CPU time :                                   380.92 sec.
Max Memory :                                 1287 MB
Average Memory :                             1045.12 MB
Total Requested Memory :                     16000.00 MB
Delta Memory :                               14713.00 MB
Max Swap :                                   -
Max Processes :                              3
Max Threads :                                91
Run time :                                   29168 sec.
Turnaround time :                            29168 sec.

The output (if any) is above this job summary.

PS:

Read file for stderr output of this job.

Contents of: /nl/umw_job_dekker/cshare/reference/sorted_chromsizes/galGal6.reduced.chrom.sizes:

chr1 197608386 chr2 149682049 chr3 110838418 chr4 91315245 chr5 59809098 chr6 36374701 chr7 36742308 chr8 30219446 chr9 24153086 chr10 21119840 chr11 20200042 chr12 20387278 chr13 19166714 chr14 16219308 chr15 13062184 chr16 2844601 chr17 10762512 chr18 11373140 chr19 10323212 chr20 13897287 chr21 6844979 chr22 5459462 chr23 6149580 chr24 6491222 chr25 3980610 chr26 6055710 chr27 8080432 chr28 5116882 chr30 1818525 chr31 6153034 chr32 725831 chr33 7821666 chrM 16784 chrW 6813114 chrZ 82529921

Pairs file:

/nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6/work/17/3f266ec32d5602fe6f19069856e46b/ICRF-12min-S2-R2__galGal6.galGal6.nodups.pairs.gz

pairs format v1.0.0

sorted: chr1-chr2-pos1-pos2

shape: upper triangle

genome_assembly: unknown

chromsize: ref|NC_001323.1| 16775

chromsize: ref|NC_006088.5| 197608386

chromsize: ref|NC_006089.5| 149682049

chromsize: ref|NC_006090.5| 110838418

chromsize: ref|NC_006091.5| 91315245

chromsize: ref|NC_006092.5| 59809098

chromsize: ref|NC_006093.5| 36374701

chromsize: ref|NC_006094.5| 36742308

chromsize: ref|NC_006095.5| 30219446

chromsize: ref|NC_006096.5| 24153086

chromsize: ref|NC_006097.5| 21119840

chromsize: ref|NC_006098.5| 20200042

chromsize: ref|NC_006099.5| 20387278

chromsize: ref|NC_006100.5| 19166714

chromsize: ref|NC_006101.5| 16219308

chromsize: ref|NC_006102.5| 13062184

chromsize: ref|NC_006103.5| 2844601

chromsize: ref|NC_006104.5| 10762512

chromsize: ref|NC_006105.5| 11373140

chromsize: ref|NC_006106.5| 10323212

chromsize: ref|NC_006107.5| 13897287

chromsize: ref|NC_006108.5| 6844979

chromsize: ref|NC_006109.5| 5459462

chromsize: ref|NC_006110.5| 6149580

chromsize: ref|NC_006111.5| 6491222

chromsize: ref|NC_006112.4| 3980610

chromsize: ref|NC_006113.5| 6055710

chromsize: ref|NC_006114.5| 8080432

chromsize: ref|NC_006115.5| 5116882

chromsize: ref|NC_006119.4| 725831

chromsize: ref|NC_006126.5| 6813114

chromsize: ref|NC_006127.5| 82529921

chromsize: ref|NC_008465.4| 7821666

chromsize: ref|NC_028739.2| 1818525

chromsize: ref|NC_028740.2| 6153034

samheader: @SQ SN:ref|NC_006088.5| LN:197608386

samheader: @SQ SN:ref|NC_006089.5| LN:149682049

samheader: @SQ SN:ref|NC_006090.5| LN:110838418

samheader: @SQ SN:ref|NC_006091.5| LN:91315245

samheader: @SQ SN:ref|NC_006092.5| LN:59809098

samheader: @SQ SN:ref|NC_006093.5| LN:36374701

samheader: @SQ SN:ref|NC_006094.5| LN:36742308

samheader: @SQ SN:ref|NC_006095.5| LN:30219446

samheader: @SQ SN:ref|NC_006096.5| LN:24153086

samheader: @SQ SN:ref|NC_006097.5| LN:21119840

samheader: @SQ SN:ref|NC_006098.5| LN:20200042

mimakaev commented 5 years ago

This is strange. Is there something after the header in the pairs file?

sergpolly commented 5 years ago
  1. it seems like your bwa index is referring to chromosomes with names, like: ref|NC_001323.1|, ref|NC_006088.5|, ... etc
  2. your reduced.chomsizes file , however, refers to "human readable" chr1 , chr2, chr3, etc

Can you trace back how you created bwa index and reduced.chromsizes ? Did you use the same fasta as input ? There can be different chromosome names in the index and reduced.chromsize but there must be an overlap as well ! example:

Another unrelated problem in your distiller run is this: --resolutions 1000000,500000,250000,100000,50000,25000,10000 You're asking cooler to build 25kb heatmaps based on 10kb ones - that is probably not going to "fly" , even after you fix your reference genome: resolutions in the "ladder" must be multiples of the highest-one (smallest bin size-one) - because all lower resolution "heatmaps" are build upon the highest one by consecutive coarsening.

nvictus commented 5 years ago

From the header, it like your chromosomes in the pairs file use ref|NC_xxx names instead of UCSC names (chr...). That must have been how they were encoded in the FASTA file.

You can confirm by checking after the header, as Max suggested.

If that's the case, your options are:

Options that involve manual intervention (for a one-off case), or modifying the pipeline:

gibcus commented 5 years ago

Yup, used NCBI fasta to generate reduced.chomsizes file. I'll generate a new one from UCSC, I guess. and try @nvictus's suggestion for re-index, and re-distill. Alternatively, I'll remap the whole d@rn thing.

gibcus commented 5 years ago

"... Another unrelated problem in your distiller run is this: --resolutions 1000000,500000,250000,100000,50000,25000,10000 You're asking cooler to build 25kb heatmaps based on 10kb ones - that is probably not going to "fly" , even after you fix your reference genome: resolutions in the "ladder" must be multiples of the highest-one (smallest bin size-one) - because all lower resolution "heatmaps" are build upon the highest one by consecutive coarsening."

Another rookie mistake...

nvictus commented 5 years ago

I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta.

EDIT: Just tested. Scratch the sensible order statement... maybe it was just a fluke the last couple genomes I tried it on.

gibcus commented 5 years ago

I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta.

I considered the "soft masked": galGal6.fa.gz, but I'll check twoBitToFA

mimakaev commented 5 years ago

Also, I generally start with 1kb resolution, not 10kb. It does not generate that much extra space, but may end up being useful for averages/pileups even in low-coverage datasets.

On Fri, Mar 29, 2019 at 2:24 PM Johan Gibcus notifications@github.com wrote:

I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta. I considered the "soft masked": galGal6.fa.gz, but I'll check twoBitToFA

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mirnylab/distiller-nf/issues/137#issuecomment-478081995, or mute the thread https://github.com/notifications/unsubscribe-auth/AJBEe2lrJueo3IuDLs5oeoj9Z44buGunks5vbkwzgaJpZM4cSzrp .

gibcus commented 5 years ago

Also, I generally start with 1kb resolution, not 10kb. It does not generate that much extra space, but may end up being useful for averages/pileups even in low-coverage datasets.

Indeed that was a space consideration, as the libraries did not have 1kb depth. I'll take your advice!