Open jeizenga opened 6 years ago
It looks like the distance index has 20 as the cap for the maximum distance so it thinks that the maximum distance is 20. This used to be the default but I changed it so if you re-make the distance index it should work
On Tue, Oct 23, 2018 at 1:57 PM Jordan Eizenga notifications@github.com wrote:
Assigned #1953 https://github.com/vgteam/vg/issues/1953 to @xchang1 https://github.com/xchang1.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/1953#event-1921716063, or mute the thread https://github.com/notifications/unsubscribe-auth/AgxATO875VNcXONXXOcQGzHa86zawKWrks5un4LJgaJpZM4X2pcU .
Shouldn't the maximum distance cap only come into play if there are cycles in the graph? This is a snp1kg graph, so it should be a DAG
Also, it looks to be hard-coded in the master branch: https://github.com/vgteam/vg/blob/master/src/subcommand/index_main.cpp#L1137 Did you change the interface in an unmerged branch?
Also, I manually changed it to 100, but something is still going on. It gets an estimate of 119, but there's a path of exactly length 64, so shouldn't it just get 64? Moreover, there doesn't seem to be any path of length 119, and the tolerance is 37 so it should only report distances up to 111.
Update: the incorrect length is my own bug, my bad.
Shoot, sorry, you're right. I fixed that particular problem. There's still something wrong with the maximum distance in cyclic graphs but it should be fine for DAGs
jk, it should work for cyclic graphs too
Wait, I got another one. This time it gets a sdsl out-of-bounds error somewhere in the TVS algorithm. Reproducing files attached, the command is:
vg mpmap -x snp1kg-BRCA1.xg -g snp1kg-BRCA1.gcsa -d snp1kg-BRCA1.dist -s snp1kg-BRCA1.snarls -f read.fq -SBv -t 1 > /dev/null
[Uploading reproducing_example.zip…]()
I don't think the upload worked reproducing_example.zip
I'm having problems with git right now but its a one line change. Line 2817 is currently if ( next_best.first <= tolerance) { but it should be if (next_best.first != -1 && next_best.first <= tolerance) {
On Wed, Oct 24, 2018 at 8:07 PM Jordan Eizenga notifications@github.com wrote:
I don't think the upload worked reproducing_example.zip https://github.com/vgteam/vg/files/2513115/reproducing_example.zip
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/1953#issuecomment-432898690, or mute the thread https://github.com/notifications/unsubscribe-auth/AgxATNZdeGp8f-f6bUmZ8WgUjWIkJvQdks5uoSsCgaJpZM4X2pcU .
Okie, I'll throw it into my branch. Thanks!
New fun. I have it working now with the edit you sent, but I'm having performance issues using TVS for paired end reads.
The paired end resolution code requires much higher values for the tolerance. It seems like the combinatorial difficulty of the problem is really biting us in the ass with these parameters. I have a few instances of TVS that take on the order of 10 seconds to execute (already too long), and one that has been going for so long that it's also possible that it hit an infinite loop (but I expect it's just NP-Completeness rearing its head).
Anyway, I'm going to re-read the paper and see if I can spot any optimizations. Tomorrow we should try to go over the implementation if you have time.
Hooray. I have an optimization implemented that should speed things up a bit. My git repo is all tangled up though; I can untangle and PR it once I finish my homework
@xchang1
I'm trying to integrate TVS into my clusterer, but it's not working for me. I have a small-ish reproducing example, which you'll need to run from my
optimize-tvs-clustering
branch. The command is this (files attached):It will eventually try to measure the distance between 2256+13 and 2261+17 with target value 64. The path-based distance function estimates the distance at 63, but the TVS says it can't find a path.
reproducing_data.zip