yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
121 stars 41 forks source link

segmentation fault when writing updated pb file when all sequences are already in the file #51

Closed rmcolq closed 3 years ago

rmcolq commented 3 years ago

I've been running usher in a nexflow pipeline, creating an initial start tree and sequentially updating with chunks of a large fasta. I've been getting a segfault only when I restart the nextflow pipeline, with the end of log reading:

  Completed in 6 msec 

  Condensing identical sequences. 
  WARNING: tree contains condensed nodes. It may be condensed already!
  Writing condensed input tree to file trees/condensed-tree.nh
  Completed in 35 msec 

  Found 0 missing samples.

  Writing uncondensed final tree to file trees/uncondensed-final-tree.nh 
  The parsimony score for this tree is: 81342 
  Completed in 21 msec 

  Saving mutation-annotated tree object to file (after condensing identical sequences) cog_gisaid_B.dequote.false.false.pb
  .command.sh: line 3: 18339 Segmentation fault: 11  usher -i cog_gisaid_B.dequote.false.pb --vcf lineage_Bplus.new.5.with_reference.vcf --threads 8 --save-mutation-annotated-tree cog_gisaid_B.dequote.false.false.pb --collapse-tree --write-uncondensed-final-tree --outdir trees

Could this be caused by there not existing any new sequences?

yatisht commented 3 years ago

I have never encountered an error like this, not even when there are no new sequences to be placed. Is it possible for you to send the files across? You can also email them to me at yturakhi@ucsc.edu.

-Yatish

rmcolq commented 3 years ago

Thank you! Cleaning the tree and fasta to remove unexpected symbols seems to have fixed this like you suggested! Might be good to have something which catches unexpected symbols in input earlier on? It surprised me that this was the cause given I only got the segfault when the protobuf was written out at the end.

yatisht commented 3 years ago

Oh, wonderful. I'll close this issue in that case. I'll also try and fix the parsing issue you suggested.