yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
121 stars 41 forks source link

matOptimize crashes with small vcf input #217

Closed willdumm closed 1 year ago

willdumm commented 2 years ago

@yceh Thanks for fixing a previous issue I was having with this setup! I finally got around to trying out your fix, but I'm now having the following issue. Any ideas?

Setup:

(on Mac, because the build failed in a fresh Ubuntu 20.04 Docker image)

git clone https://github.com/yatisht/usher.git
cd usher
# Checkout Cheng's PR related to an earlier issue I had (I also tried with HEAD)
git checkout 3943afdfb534184f08d656bcea7d0385e0c6abf4
./install/installMacOS.sh

The Issue

If I run

faToVcf deduplicated.fasta out.vcf -ref=GL
usher -t starttree.nh -v out.vcf -o mat.pb -d ushertree/
matOptimize -v out.vcf -i mat.pb -o opt_mat.pb

with the attached files deduplicated.fasta and starttree.nh as inputs, I get the following output from matOptimize:

Running with 1 processes
Summary:
Extract starting tree from mat.pb last modified: Thu Mar  3 11:36:59 2022
Load sample variant from out.vcf last modified: Thu Mar  3 11:36:32 2022
Will output final protobuf to opt_mat.pb .
Will output intermediate protobuf to opt_mat.pbintermediate0Uf9Eo.pb.
Will double radius after each iteration
Run kill -s SIGUSR2 87198 to apply all the move found immediately, then output and exit.
Using 8 threads.
Will drift for 0 iterations
Loading input tree
Finished loading input tree, start reading VCF and assigning states
[1]    87198 trace trap  ../usher/build/matOptimize -v out.vcf -i mat.pb -o opt_mat.pb

Both opt_mat.pb and an intermediate protobuf file are created, but they are empty.

inputs.zip

yceh commented 2 years ago

Sorry, I have some difficulties reproducing it. I tried to build it from the source on the master branch or via the public container, and it seems to be working.

$ docker run -v `pwd`:/data -ti yatisht/usher
root@5b8466a3dfe2:/HOME/usher/test_input# faToVcf deduplicated.fasta out.vcf -ref=GL
root@5b8466a3dfe2:/HOME/usher/test_input# usher -t starttree.nh -v out.vcf -o mat.pb -d ushertree/
Initializing 12 worker threads.
...........
WARNING: Following samples had multiple possibilities of parsimony-optimal placements:
seq22
seq28
seq4
seq19
seq3
seq30
Saving mutation-annotated tree object to file (after condensing identical sequences) mat.pb
Completed in 0 msec 

root@5b8466a3dfe2:/HOME/usher/test_input# matOptimize -v out.vcf -i mat.pb -o opt_mat.pb
Running with 1 processes
Summary:
Extract starting tree from mat.pb last modified: Thu Mar  3 21:35:30 2022
Load sample variant from out.vcf last modified: Thu Mar  3 21:35:30 2022
Will output final protobuf to opt_mat.pb .
Will output intermediate protobuf to opt_mat.pbintermediateXJv29d.pb. 
Will double radius after each iteration
Run kill -s SIGUSR2 27 to apply all the move found immediately, then output and exit.
Using 12 threads. 
Will drift for 0 iterations 
Loading input tree
Finished loading input tree, start reading VCF and assigning states 
Total mutation size 126 
Finished loading from VCF and state assignment

load vcf took 0.000008 minutes
Before condensing 78
1 nodes cleaned
0 condensed nodes
0 nodes cleaned
tree post processing took 0.000002 minutes
populated ignored range
Checkpoint initial tree.
Success
Finished checkpointing initial tree.
after state reassignment:78
Height:6
Start Finding nodes to move 
find max_level
Max level 6
Upward: 0,downward 0, both 0 
Search all nodes
Will search 1.000000 of nodes
Took 0.000009 s to find nodes to move
55 nodes to search
start
Total 168,max 8 
Updating sensitive alleles take 0 sec55 nodes to search 
Node size: 55
Will stop in 153722751 min
+++++++++Move receiver exit
 55 nodes left, 0.0 min left
requesting 1200 nodes from 0 after 0 seconds, got 55 nodes 
......
Success
Took 0second to save intermediate protobuf
Last round improvement 0.000000
Less than minimium improvement,stalled for -1 iterations
Will drift for 0 iterations 
Final Parsimony score 76
0 condensed_nodes
Maximum memory usage from 0: 21276 kb 
root@5b8466a3dfe2:/HOME/usher/test_input# matUtils summary -i opt_mat.pb
Loading input MAT file opt_mat.pb.
Completed in 0 msec 

No arguments set; getting basic statistics...
Total Nodes in Tree: 55
Total Samples in Tree: 43
Total Condensed Nodes in Tree: 0
Total Samples in Condensed Nodes: 0
Total Tree Parsimony: 76
Number of Clade Annotations: 0
Max Tree Depth: 6
Mean Tree Depth: 4.581395
Completed in 0 msec 
willdumm commented 2 years ago

Hmm, thanks for looking into it! I must be doing something wrong, since I'm getting the error both on my Mac and in a local Conda build on our Ubuntu server.

The public container works for me, of course. I hadn't thought to try that, I assumed that since there had been no tagged version since your changes that it wouldn't work.

russcd commented 1 year ago

I believe this one has been resolved, but please reopen if there is more to do.