hasindu2008 / f5c

Ultra-fast methylation calling and event alignment tool for nanopore sequencing data (supports CUDA acceleration)
https://hasindu2008.github.io/f5c/docs/overview
MIT License
144 stars 26 forks source link

Empty bam and tsv file with f5c methylation calling #116

Closed PanZiwei closed 1 year ago

PanZiwei commented 2 years ago

Hi, I am trying to use f5c and follow the "Resource efficient methylation calling workflow for a dataset with many ultra-long reads. However, the methylation call failed and I think it might be something wrong with the samtools step to get the bam file - I got an empty bam file and empty tsv files. I have attache the error information for your reference.

Would really appreciate it if you can help to solve the issue! Thanks!


[M::mm_idx_gen::104.657*0.99] collected minimizers
[M::mm_mapopt_update::154.168*1.00] mid_occ = 892
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 25
[M::mm_idx_stat::155.147*1.00] distinct minimizers: 100277218 (38.70% are singletons); average occurrences: 5.856; average spacing: 5.308; total length: 3117292070
[M::worker_pipeline::1700.969*1.00] mapped 31053 sequences
[M::worker_pipeline::2978.382*1.00] mapped 27179 sequences
[M::worker_pipeline::4503.583*1.00] mapped 27496 sequences
[M::worker_pipeline::5921.870*1.00] mapped 26935 sequences
[M::worker_pipeline::7378.081*1.00] mapped 26405 sequences
[M::worker_pipeline::8825.560*1.00] mapped 26779 sequences
[M::worker_pipeline::10403.014*1.00] mapped 26387 sequences
[M::worker_pipeline::11846.442*1.00] mapped 27982 sequences
[M::worker_pipeline::13383.000*1.00] mapped 26939 sequences
[M::worker_pipeline::14902.949*1.00] mapped 28122 sequences
[M::worker_pipeline::16351.550*1.00] mapped 27010 sequences
[M::worker_pipeline::17843.300*1.00] mapped 25898 sequences
[M::worker_pipeline::19364.163*1.00] mapped 25301 sequences
[M::worker_pipeline::20808.294*1.00] mapped 25726 sequences
[M::worker_pipeline::22241.301*1.00] mapped 25931 sequences
[ERROR] failed to write the results: Disk quota exceeded
[W::sam_read1] parse error at line 430685
[bam_sort_core] truncated file. Aborting.
samtools index: "sample_r9_guppy_6.1.5.bam" is in a format that cannot be usefully indexed
[find_all_fast5] Looking for fast5 in /fastscratch/c-panz/raw/sample_r9/
[f5c_index_iop] 1481 fast5 files found - took 0.297s
[f5c_index_iop] Spawning 64 I/O processes to circumvent HDF hell
[f5c_index_iop] Parallel indexing done - took 18860.476s
[f5c_index_merge] Indexing merging done - took 8.393s.
[readdb] num reads: 5922364, num reads with path to fast5: 2326610
[M::mm_idx_gen::104.657*0.99] collected minimizers
[M::mm_mapopt_update::154.168*1.00] mid_occ = 892
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 25
[M::mm_idx_stat::155.147*1.00] distinct minimizers: 100277218 (38.70% are singletons); average occurrences: 5.856; average spacing: 5.308; total length: 3117292070
[M::worker_pipeline::1700.969*1.00] mapped 31053 sequences
[M::worker_pipeline::2978.382*1.00] mapped 27179 sequences
[M::worker_pipeline::4503.583*1.00] mapped 27496 sequences
[M::worker_pipeline::5921.870*1.00] mapped 26935 sequences
[M::worker_pipeline::7378.081*1.00] mapped 26405 sequences
[M::worker_pipeline::8825.560*1.00] mapped 26779 sequences
[M::worker_pipeline::10403.014*1.00] mapped 26387 sequences
[M::worker_pipeline::11846.442*1.00] mapped 27982 sequences
[M::worker_pipeline::13383.000*1.00] mapped 26939 sequences
[M::worker_pipeline::14902.949*1.00] mapped 28122 sequences
[M::worker_pipeline::16351.550*1.00] mapped 27010 sequences
[M::worker_pipeline::17843.300*1.00] mapped 25898 sequences
[M::worker_pipeline::19364.163*1.00] mapped 25301 sequences
[M::worker_pipeline::20808.294*1.00] mapped 25726 sequences
[M::worker_pipeline::22241.301*1.00] mapped 25931 sequences
[ERROR] failed to write the results: Disk quota exceeded
[W::sam_read1] parse error at line 430685
[bam_sort_core] truncated file. Aborting.
samtools index: "sample_r9_guppy_6.1.5.bam" is in a format that cannot be usefully indexed
[find_all_fast5] Looking for fast5 in /fastscratch/c-panz/raw/sample_r9/
[f5c_index_iop] 1481 fast5 files found - took 0.297s
[f5c_index_iop] Spawning 64 I/O processes to circumvent HDF hell
[f5c_index_iop] Parallel indexing done - took 18860.476s
[f5c_index_merge] Indexing merging done - took 8.393s.
[readdb] num reads: 5922364, num reads with path to fast5: 2326610
[M::mm_idx_gen::104.657*0.99] collected minimizers
[M::mm_idx_gen::151.886*1.00] sorted minimizers
[M::main::151.886*1.00] loaded/built the index for 25 target sequence(s)
[M::mm_mapopt_update::154.168*1.00] mid_occ = 892
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 25
[M::mm_idx_stat::155.147*1.00] distinct minimizers: 100277218 (38.70% are singletons); average occurrences: 5.856; average spacing: 5.308; total length: 3117292070
[M::worker_pipeline::1700.969*1.00] mapped 31053 sequences
[M::worker_pipeline::2978.382*1.00] mapped 27179 sequences
[M::worker_pipeline::4503.583*1.00] mapped 27496 sequences
[M::worker_pipeline::5921.870*1.00] mapped 26935 sequences
[M::worker_pipeline::7378.081*1.00] mapped 26405 sequences
[M::worker_pipeline::8825.560*1.00] mapped 26779 sequences
[M::worker_pipeline::10403.014*1.00] mapped 26387 sequences
[M::worker_pipeline::11846.442*1.00] mapped 27982 sequences
[M::worker_pipeline::13383.000*1.00] mapped 26939 sequences
[M::worker_pipeline::14902.949*1.00] mapped 28122 sequences
[M::worker_pipeline::16351.550*1.00] mapped 27010 sequences
[M::worker_pipeline::17843.300*1.00] mapped 25898 sequences
[M::worker_pipeline::19364.163*1.00] mapped 25301 sequences
[M::worker_pipeline::20808.294*1.00] mapped 25726 sequences
[M::worker_pipeline::22241.301*1.00] mapped 25931 sequences
[ERROR] failed to write the results: Disk quota exceeded
[W::sam_read1] parse error at line 430685
[bam_sort_core] truncated file. Aborting.  
samtools index: "sample_r9_guppy_6.1.5.bam" is in a format that cannot be usefully indexed
[parse_index_options::INFO]^[[1;34m Consider using --slow5 option for fast indexing, methylation calling and eventalignment. See f5c section under https://hasindu2008.github.io/slow5tools/workflows.html for an example.^[[0m
[find_all_fast5] Looking for fast5 in /fastscratch/c-panz/raw/sample_r9/
[f5c_index_iop] 1481 fast5 files found - took 0.297s
[f5c_index_iop] Spawning 64 I/O processes to circumvent HDF hell
[f5c_index_iop] Parallel indexing done - took 18860.476s
[f5c_index_merge] Indexing merging done - took 8.393s.
[readdb] num reads: 5922364, num reads with path to fast5: 2326610
[index_main::WARNING]^[[1;33m fast5 files could not be located for 3595754 reads^[[0m
[main] Version: 1.1
[main] CMD: f5c index --iop 64 -t 32 -d /fastscratch/c-panz/raw/sample_r9/ /fastscratch/c-panz/raw_fastq/sample_r9_guppy_6.1.5.fq
[main] Real time: 40027.256 sec; CPU time: 20916.302 sec; Peak RAM: 1.554 GB

call-methylation: unrecognized option '---ultra-thresh'
call-methylation: invalid option -- 'd'
call-methylation: invalid option -- 'i'
call-methylation: invalid option -- 's'
call-methylation: invalid option -- 'a'
[meth_main::INFO]^[[1;34m Default methylation tsv output format is changed from f5c v0.7 onwards to match latest nanopolish output. Set --meth-out-version=1 to fall back to the old format.^[[0m
[E::hts_open_format] Failed to open file le-cuda=yes
[init_core::ERROR]^[[1;31m No such file or directory.^[[0m
[init_core::DEBUG]^[[1;35m Error occured at src/f5c.c:102.^[[0m

Bad file format with no header?
Bad file format with no header?
Bad file format with no header?                                                                                                                                                       ```
hasindu2008 commented 2 years ago

Hi

Your minimap2 execution has failed. If you see the logs you attached it says [ERROR] failed to write the results: Disk quota exceeded You will freeup some space on the disk.

PanZiwei commented 2 years ago

Hi, I belive I have enough space on the disk - I run on the HPC instead of the local node. Do you think I should increase the job memory to finish the job? THX!

hasindu2008 commented 2 years ago

It is always recommended to submit these as a job. But the problem here seems to be your storage quota. Are you perhaps writing files to your home directory instead of the scratch storage?

hasindu2008 commented 1 year ago

@PanZiwei Has this issue been fixed?

PanZiwei commented 1 year ago

Thanks for checking. It was solved and I will close the issue.