wwood / finishm

genome improvement and finishing without further sequencing effort
MIT License
5 stars 2 forks source link

Crash on finishm gapfill when applied to entire sample #25

Closed donovan-h-parks closed 9 years ago

donovan-h-parks commented 9 years ago

FinishM failed with the following when trying to gap fill. Note that I am trying to do this across all contigs (~100,000) in my sample.

finishm gapfill --contigs cck10.scaffolds_500bp.fa --output-fasta cck10.scaffolds_500bp.filled.fna --fastq-gz cck10.1.fq.gz,cck10.2.fq.g

z --overhang 20 --leash-length 5000 --velvet-directory ./velvet INFO finishm 27/02 10:57:42: Detected 118379 scaffolds, containing 279248 different contigs WARN finishm 27/02 10:57:42: Removed 1 contigs from within scaffolds as they were too short INFO finishm 27/02 10:58:32: Detected 118379 scaffolds, containing 279248 different contigs INFO finishm 27/02 10:58:34: Detected 160869 gap(s) from 118379 different contig(s). 27206 contig(s) were gap-free. INFO bio-velvet 27/02 10:59:47: Running velveth: /srv/sw/finishm/0.0.0.dev/lib/assembly/../../ext/src/velveth INFO finishm 27/02 10:59:47: Assembling sampled reads with velvet INFO bio-velvet 27/02 10:59:47: Running velveth: /srv/sw/finishm/0.0.0.dev/lib/assembly/../../ext/src/velveth ./velvet 51 -fasta -short /tmp/probes.fa20150227-64962-10c8kpn -fastq.gz -short cck10.1.fq.gz cck10.2.fq.gz -create_binary INFO bio-velvet 27/02 13:40:19: Running velvetg: /srv/sw/finishm/0.0.0.dev/lib/assembly/../../ext/src/velvetg ./velvet -read_trkg yes -cov_cutoff 3.5 -tour_bus no -read_to_no de_binary yes INFO finishm 27/02 18:03:13: Finished running assembly INFO finishm 27/02 18:03:13: Reading in the actual sequences of all reads from ./velvet/CnyUnifiedSeq INFO finishm 27/02 18:03:53: Read in 105372648 sequences INFO finishm 27/02 18:03:56: Parsing the graph output from velvet INFO finishm 27/02 18:09:18: Finished parsing graph: found 3600360 nodes and 1547704 arcs INFO finishm 27/02 18:09:18: Beginning parse of graph using velvet's parsing C code.. INFO finishm 27/02 18:10:51: Completed velvet code parsing velvet graph INFO finishm 27/02 18:10:51: Reading ReadToNode.bin file.. INFO finishm 27/02 18:10:52: Finding probe nodes in the assembly /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:67:in pick_best_node_for_read_id': undefined methoddirection' for nil:NilClass (NoMethodError) from /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:80:in block in find_probes_from_read_to_node' from /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:78:incollect' from /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:78:in find_probes_from_read_to_node' from /srv/sw/finishm/0.0.0.dev/lib/assembly/graph_generator.rb:196:ingenerate_graph' from /srv/sw/finishm/0.0.0.dev/lib/finishm/gapfiller.rb:153:in run' from /srv/sw/finishm/0.0.0.dev//bin/finishm:124:in

'

wwood commented 9 years ago

Try roundup --gapfill-only instead, better supported. On 28 Feb 2015 00:16, "Donovan Parks" notifications@github.com wrote:

FinishM failed with the following when trying to gap fill. Note that I am trying to do this across all contigs (~100,000) in my sample.

finishm gapfill --contigs cck10.scaffolds_500bp.fa --output-fasta cck10.scaffolds_500bp.filled.fna --fastq-gz cck10.1.fq.gz,cck10.2.fq.g

z --overhang 20 --leash-length 5000 --velvet-directory ./velvet INFO finishm 27/02 10:57:42: Detected 118379 scaffolds, containing 279248 different contigs WARN finishm 27/02 10:57:42: Removed 1 contigs from within scaffolds as they were too short INFO finishm 27/02 10:58:32: Detected 118379 scaffolds, containing 279248 different contigs INFO finishm 27/02 10:58:34: Detected 160869 gap(s) from 118379 different contig(s). 27206 contig(s) were gap-free. INFO bio-velvet 27/02 10:59:47: Running velveth: /srv/sw/finishm/0.0.0.dev/lib/assembly/../../ext/src/velveth INFO finishm 27/02 10:59:47: Assembling sampled reads with velvet INFO bio-velvet 27/02 10:59:47: Running velveth: /srv/sw/finishm/0.0.0.dev/lib/assembly/../../ext/src/velveth ./velvet 51 -fasta -short /tmp/probes.fa20150227-64962-10c8kpn -fastq.gz -short cck10.1.fq.gz cck10.2.fq.gz -create_binary INFO bio-velvet 27/02 13:40:19: Running velvetg: /srv/sw/finishm/0.0.0.dev/lib/assembly/../../ext/src/velvetg ./velvet -read_trkg yes -cov_cutoff 3.5 -tour_bus no -read_to_no de_binary yes INFO finishm 27/02 18:03:13: Finished running assembly INFO finishm 27/02 18:03:13: Reading in the actual sequences of all reads from ./velvet/CnyUnifiedSeq INFO finishm 27/02 18:03:53: Read in 105372648 sequences INFO finishm 27/02 18:03:56: Parsing the graph output from velvet INFO finishm 27/02 18:09:18: Finished parsing graph: found 3600360 nodes and 1547704 arcs INFO finishm 27/02 18:09:18: Beginning parse of graph using velvet's parsing C code.. INFO finishm 27/02 18:10:51: Completed velvet code parsing velvet graph INFO finishm 27/02 18:10:51: Reading ReadToNode.bin file.. INFO finishm 27/02 18:10:52: Finding probe nodes in the assembly /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:67:in pick_best_node_for_read_id': undefined methoddirection' for nil:NilClass (NoMethodError) from /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:80:in block in find_probes_from_read_to_node' from /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:78:incollect' from /srv/sw/finishm/0.0.0.dev/lib/assembly/node_finder.rb:78:in find_probes_from_read_to_node' from /srv/sw/finishm/0.0.0.dev/lib/assembly/graph_generator.rb:196:in generate_graph' from /srv/sw/finishm/0.0.0.dev/lib/finishm/gapfiller.rb:153:in run' from /srv/sw/finishm/0.0.0.dev//bin/finishm:124:in'

— Reply to this email directly or view it on GitHub https://github.com/wwood/finishm/issues/25.

donovan-h-parks commented 9 years ago

Cool. I'll give it a go. Any recommendations on how to set the parameters if one is aiming just to do gap filling? I've also limited things to scaffolds >= 2k.

wwood commented 9 years ago

I think the defaults should be fine. Maybe limit the leash length to double your insert size of something?

On 28 February 2015 at 13:44, Donovan Parks notifications@github.com wrote:

Cool. I'll give it a go. Any recommendations on how to set the parameters if one is aiming just to do gap filling? I've also limited things to scaffolds >= 2k.

— Reply to this email directly or view it on GitHub https://github.com/wwood/finishm/issues/25#issuecomment-76547670.

Ben Woodcroft http://ecogenomic.org/users/ben-woodcroft http://www.ecogenomic.org/

donovan-h-parks commented 9 years ago

Fills gaps. No crashing!