Closed fredericraymond closed 11 years ago
see #141
/home/boisver1/issue-153
use blat 34 and
/rap/nne-790-ab/software/blatAligner/LastBuild/blat Contigs.fasta Contigs.fasta self.psl -fastMap
~/git-clones/Ray-TestSuite/scripts/dumpPsl.py
filtered PSL:
[boisver1@cp2567 Sample_CQDM2-3-1]$ ~/git-clones/Ray-TestSuite/scripts/dumpPsl.py
33838 0 0 0 0 0 0 0 - contig-1000013 58026 0 33838 contig-4 125302 91454 1252921
33838, 24188, 91454,
132248 2 0 0 8 8 8 8 + contig-2000013 207507 19862 152120 contig-19 139462 0 1322589
275,38,923,36,181,43,130591,26,137, 19862,20138,20177,21101,21138,21320,21364,151956,151983, 0,276,315,1239,1276,1458,1502,132094,132121,
132248 2 0 0 8 8 8 8 + contig-19 139462 0 132258 contig-2000013 207507 19862 1521209
275,38,923,36,181,43,130591,26,137, 0,276,315,1239,1276,1458,1502,132094,132121, 19862,20138,20177,21101,21138,21320,21364,151956,151983,
75123 29 0 0 6 16 6 16 - contig-1000020 106983 31815 106983 contig-21 106317 0 75168 7
74274,26,365,251,50,25,161, 0,74275,74302,74668,74920,74971,75007, 0,74275,74302,74668,74920,74971,75007,
75123 29 0 0 6 16 6 16 - contig-21 106317 0 75168 contig-1000020 106983 31815 1069837
161,25,50,251,365,26,74274, 31149,31321,31347,31398,31650,32016,32043, 31815,31987,32013,32064,32316,32682,32709,
probably a duplicate of #62
4 odd relations:
contig-1000013 (58026) & contig-4 (125302) 33838--1 & 91455--125292 33838/33838
contig-1000013 (58026) & contig-2 (52572) 33140--58026 & 27686--52572 24887/24887
contig-2000013 (207507) & contig-19 (139462) 21365--151955 & 1503--132093 130590/130591
contig-1000020 (106983) & contig-21 () 106983--32710 & 1--74274 74270/74274
needs 2 features in Ray Cloud Browser to understand this faster:
1- link to a location 2- multi-path
/home/boisver1/issue-153
/rap/nne-790-ab/projects/Ray-Cloud-Browser/issue-153
+++ /rap/nne-790-ab/projects/Ray-Cloud-Browser/issue-153/Sample_CQDM2-3-2013-02-19-1
http://browser.cloud.raytrek.com/client/
$ python2.7 ~/git-clones/Ray-TestSuite/scripts/dumpPsl.py 132248 2 0 0 8 8 8 8 + contig-5 207507 19862 152120 contig-51 139462 0 1322589 275,38,923,36,181,43,130591,26,137, 19862,20138,20177,21101,21138,21320,21364,151956,151983, 0,276,315,1239,1276,1458,1502,132094,132121, 28384 0 0 0 0 0 0 0 - contig-43 52572 0 28384 contig-1000096 123145 94761 1231451 28384, 24188, 94761, 132248 2 0 0 8 8 8 8 + contig-51 139462 0 132258 contig-5 207507 19862 1521209 275,38,923,36,181,43,130591,26,137, 0,276,315,1239,1276,1458,1502,132094,132121, 19862,20138,20177,21101,21138,21320,21364,151956,151983, 33848 0 0 0 0 0 0 0 - contig-54 58036 24188 58036 contig-105 125302 0 33848 1 33848, 0, 0, 92022 0 0 0 0 0 0 0 - contig-1000096 123145 3438 95460 contig-105 125302 33149 1251711 92022, 27685, 33149, 92022 0 0 0 0 0 0 0 - contig-105 125302 33149 125171 contig-1000096 123145 3438 95460 1 92022, 131, 3438,
lengths are in k-mer positions are 1-based
contig-51 139402 contig-5 207447
contig-5:19863 ....... contig-51:1 http://browser.cloud.raytrek.com/client/?map=3§ion=0®ion=9&location=0&zoom=1.2255452109421872 contig-5:151294 .......... contig-51:139402 http://browser.cloud.raytrek.com/client/?map=3§ion=0®ion=9&location=139401&zoom=1.2255452109421872
51is redundant because 5 includes it.
However, blat says 132248 / 139462 nucleotides match.
New link: http://genome.ulaval.ca:10111/client
So the obvious question is where are the differences ?
This bug is not really a bug. This happens because the sample is highly polymorphic.
However, it will be fixed regardless by lowering the required matches.
However, blat says 132248 / 139462 nucleotides match.
New link: http://genome.ulaval.ca:10111/client
So the obvious question is where are the differences ?
Auto-play link:
http://genome.ulaval.ca:10111/client/?map=0§ion=0®ion=1&location=16&play=forward&speed=8
Answer:
On http://genome.ulaval.ca:10111/client/
contig-5 is 207k contig-51 is 139k
MUMmer analyses
/home/boiseb01/Hathor/data/Sample_CQDM2-3-2013-02-19-1 http://mummer.sourceforge.net/manual/#snpdetection
/home/boiseb01/Hathor/data/Sample_CQDM2-3-2013-02-19-1/5.fasta /home/boiseb01/Hathor/data/Sample_CQDM2-3-2013-02-19-1/51.fasta NUCMER
[P1] [SUB] [P2] | [BUFF] [DIST] | [LEN R] [LEN Q] | [FRM] [TAGS]
20138 A T 276 | 39 276 | 207507 139462 | 1 1 contig-5 contig-51 20177 T G 315 | 39 315 | 207507 139462 | 1 1 contig-5 contig-51 21009 T C 1147 | 92 1147 | 207507 139462 | 1 1 contig-5 contig-51 21101 A G 1239 | 37 1239 | 207507 139462 | 1 1 contig-5 contig-51 21138 T C 1276 | 37 1276 | 207507 139462 | 1 1 contig-5 contig-51 21320 T C 1458 | 44 1458 | 207507 139462 | 1 1 contig-5 contig-51 21364 C T 1502 | 44 1502 | 207507 139462 | 1 1 contig-5 contig-51 21462 T G 1600 | 98 1600 | 207507 139462 | 1 1 contig-5 contig-51 151956 A G 132094 | 27 7369 | 207507 139462 | 1 1 contig-5 contig-51 151983 G A 132121 | 27 7342 | 207507 139462 | 1 1 contig-5 contig-51
SNP @ contig-5:20138 and contig-51:276 http://genome.ulaval.ca:10111/client/?map=0§ion=0®ion=1&location=214&zoom=0.7833022538927872
run a job with -debug-fusions 10244720 10244722
The 139k contig belongs to Rank 51.
Sample_CQDM2-3-2013-03-14-2.1.051:FusionWorker worker 0 path 51 strand= 0 is Done, analyzed 139402 position length is 139402
FusionWorker path 1000005 matches= 132171 length= 207447
In code/plugin_FusionTaskCreator/FusionWorker.cpp, a maximum of 1024 kmers can be lost.
Here, about 7k are not matching.
But the matches start at contig-51:1 http://genome.ulaval.ca:10111/client/?map=0§ion=0®ion=1&location=0 They end at contig-51: http://genome.ulaval.ca:10111/client/?map=0§ion=0®ion=1&location=139401
So the 7k that Ray says that do not match is between the start and the end.
MUMmer analysis with nucmer:
/home/boiseb01/Hathor/data/Sample_CQDM2-3-2013-02-19-1/5.fasta /home/boiseb01/Hathor/data/Sample_CQDM2-3-2013-02-19-1/51.fasta NUCMER
{{{
19863 152120 | 1 132258 | 132258 132258 | 99.99 | 207507 139462 | 63.74 94.83 | contig-5 contig-51 150618 151354 | 138726 139462 | 737 737 | 98.78 | 207507 139462 | 0.36 0.53 | contig-5 contig-51 }}}
At contig-51:132258 http://genome.ulaval.ca:10111/client/?map=0§ion=0®ion=1&location=132207&zoom=0.5347674032378408
At contig-51:138726 http://genome.ulaval.ca:10111/client/?map=0§ion=0®ion=1&location=138829&zoom=0.7486437028696127
How is that even possible ?
It is probably due to seeds that should not exist.
Will fix #136, then check if this issue gets fixed.
From the first message:
contig-1332000049 contig-1285000051 100.00 130591 1 0 116521 247111 137960 7370 0.0e+00 255155.0 contig-1332000049 contig-1285000051 100.00 115293 1 0 1 115293 254480 139188 0.0e+00 225326.0
/rap/nne-790-ab/projects/Project_CQDM2/CQDM_Run1/Sample_CQDM2-3-61-SilverRay-2013-02-04
contig-1332000049 has 302666 nucleotides contig-1285000051 has 254480 nucleotides
with http://mummer.sourceforge.net/manual/#aligningdraft
/home/boiseb01/issue-153/contig-1285000051.fasta /home/boiseb01/issue-153/contig-1332000049.fasta
NUCMER
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS]
===============================================================================================================================
7205 254480 | 247276 1 | 247276 247276 | 99.99 | 254480 302666 | 97.17 81.70 | contig-1285000051 contig-1332000049
Trying to reproduce the problem: mp2 /netmount/ip03_home/boisver1/issue-153
Ray fbc3a9c859c72340df3d061de5f8e553326597c9 RayPlatform 7eece8a3cb2eb4e132f76f854f00d3f9640f12da
This duplication occurs during the extension of seeds or before
contig-2000013 207507 => 207447 kmers
Rank 13 reached 207447 vertices from seed 91, flow 2
Spawned from:
Rank 13 starts on seed 91, length is 105, flow 0 [91/639]
contig-19 139462 => 139402 kmers
Rank 19 reached 139402 vertices from seed 1, flow 2
Spawned from:
Rank 19 starts on seed 1, length is 6007, flow 0 [1/676]
-k 61
The question is:
What is the seed mode coverage for these extensions.
ls30 /home/boiseb01/issue-153
/home/boisver1/issue-153/Sample_CQDM2-3-10/logs
contig-2000013 207507 19862 152120 contig-19 139462
=> 207447 objects Rank 13 reached 207447 vertices from seed 91, flow 2 Rank 13 starts on seed 91, length is 105, flow 0 [91/639]
rank = 13 id = 91
Seed # 91000013
=> 139402 objects Rank 19 starts on seed 1, length is 6007, flow 0 [1/676] Rank 19 reached 139402 vertices from seed 1, flow 2
-k 61
simply abort the whole thing is the resolution does not allow this to be done.
33838 0 0 0 0 0 0 0 - contig-1000013 58026 0 33838 contig-4 125302 91454 125292133838, 24188, 91454, 75123 29 0 0 6 16 6 16 - contig-1000020 106983 31815 106983 contig-21 106317 0 75168 774274,26,365,251,50,25,161, 0,74275,74302,74668,74920,74971,75007, 0,74275,74302,74668,74920,74971,75007, 75123 29 0 0 6 16 6 16 - contig-21 106317 0 75168 contig-1000020 106983 31815 1069837161,25,50,251,365,26,74274, 31149,31321,31347,31398,31650,32016,32043, 31815,31987,32013,32064,32316,32682,32709,
Fixed.
7598d4b2998067aed82e30cc61ad7b530c928d0d
Ran SilverRay with k=61. Obtained one big contig that was duplicated. I don't want duplication of contigs.
For example :
contig-1332000049 contig-1285000051 100.00 130591 1 0 116521 247111 137960 7370 0.0e+00 255155.0 contig-1332000049 contig-1285000051 100.00 115293 1 0 1 115293 254480 139188 0.0e+00 225326.0
This run is found here : /rap/nne-790-ab/projects/Project_CQDM2/CQDM_Run1/Sample_CQDM2-3-61-SilverRay-2013-02-04