Open wjwei-handsome opened 3 months ago
This sounds like an issue with the graph or the distance index. What kind of data do you have, how did you obtain/build the graph, and how did you obtain/build the distance index?
The graph is clip.gfa
from the Minigraph-Cactus pipeline, and the distance index was built using the command vg index -t 32 -j test.clip.dist test.clip.gbz
.
Additionally, I ran vg haplotype on the graph for each chromosome and found that only chr12 didn't work.
Can you share the graph? I don't think we can figure this out without it.
Of course, thank you very much for your assistance, which will greatly help us!
I sent a Google drive link to your work mailbox (uscs.edu), please let me know if you have other data needs.
@xchang1 There seems to be something wrong with the distance index. I'm iterating over the only top-level chain. The last net handles that look correct correspond to nodes (22649686, reverse) and (22649687, reverse). The next node on the haplotypes is (23053794, reverse), which has a self-loop on the right side and a simple snarl on the left side. Instead, we arrive at (22649676, reverse), which has been visited a bit earlier. Then the error message comes from trying to get its parent with SnarlDistanceIndex::get_parent()
.
Here is the subgraph: subgraph.pdf
You can find the graph and the distance index at /private/groups/cgl/jlsiren/issue_4381
.
I just made a PR (#4395) that should fix this. I haven't tested it on the full graph yet though
Shocked by your speed and efficiency!
Thanks again for your help! @jltsiren @xchang1
I will try it on the full graph. If there are any follow-up questions, I will keep in touch with you.
BTW, Compiling the source code is still a struggle. I would be grateful if you could provide me with the compilation results of the latest repaired version. @xchang1
Haha more like I wrote a lot of dumb bugs that are fast to fix once someone points them out but thanks! I hope it works
Here's a gzipped binary. It's for commit c5ff42
, which is the current master branch plus my changes
Hi @xchang1
Unfortunately, when I tried the new version you provided in the full graph, the same error happened.
Using contig name GRch38.chr12 for chain 0
Partitioned 1 components into 1 jobs in 1.26687 seconds
Running 32 jobs in parallel
error: [job 0]: error: trying to access a snarl tree node of the wrong type
The version:
version v1.59.0-26-gc5ff4208e "Casatico"
Thank you very much for your help before, if you can continue to fix this stubborn error, I will be grateful!
Ah sorry, I forgot to say, you have to rebuild the distance index. I'm still running it on the chr12 graph but it's getting farther than before at least
Oh, I should have thought of that! Sorry, I'll keep trying.
1. What were you trying to do?
Haplotype sampling
First step is preprocessing the graph
2. What did you want to happen?
Successfully generate
sample.hapl
file3. What actually happened?
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:5. What data and command can the vg dev team use to make the problem happen?
vg haplotypes -v3 -t16 -H test.hapl test.gbz
6. What does running
vg version
say?Interestingly, I encountered this problem when building dist index before: https://github.com/vgteam/vg/issues/3884
So I guess, is there some problem in the distance index when doing haplotypes sampling?
Looking forward to your reply :)