Closed martinghunt closed 1 year ago
@yceh can you take a look at this when you get a moment?
Could you please tell us more on how the tree.pb is generated? It seems to be corrupted, as one leaf node name is an empty string. condensed_nodes { node_name: "node_1_condensed_2_leaves" condensed_leaves: "" condensed_leaves: "SRR18554366" }
tree.pb
was made with Usher. The whole pipeline I'm running is:
mafft --keeplength
on each sequence vs reffaToVcf -includeNoAltN <MSA fasta from 1> <out.vcf>
usher --sort-before-placement-3 --vcf <VCF from 2> --tree <empty tree file> -T 20 --save-mutation-annotated-tree tree.pb
(empty tree file just contains ()
)matOptimize -T 20 -r 8 -M 2 -i <tree.pb from 3> -o tree.optimized.pb
usher_to_taxonium --input <tree.optimized.pb from 4> …
I've attached the MSA fasta here used in step 3. Have double-checked that none of the sequence names are empty. There's the ref first NC_045512 and then the rest are all called SRR*. msa.fa.gz
Could you please use usher-sampled instead of usher (command line arguments are the same)? This is a fix in empty tree handling that is not back-ported to usher. By the way, it is better to run matOptimize with argument -m 0.000000001 for small trees, so it won't stop too early when there are still optimization opportunities. Thanks!
Thank you! It's all working perfectly for me now.
I'm getting this error on 3 of my 8 data sets (5 run perfectly) when running
matOptimize
:I guess is hitting this: https://github.com/yatisht/usher/blob/b53483e1b96447ae3846688f812b16a8859368fa/src/matOptimize/condense.cpp#L137-L139
I've attached example input files. Command I'm running is:
...which throws that error. It runs fine without
--vcf fatoVcf.vcf.gz
. I think this is a bug?usher_files.zip