Closed sebhtml closed 11 years ago
With some sequencing technologies, seeds are separated by vertices that have a large family (many parents, many children).
Any two seeds A and B where B is the only child seed of A and A is the only parent seed of B have to be merged.
This will be fast as it does not require any read information. It is only topological.
Do we see this also when the kmer length is only 31 ?
For -k 71
/mnt/lustre03/corbeil/corbeil_group/projects/ray-assembler/tickets/171/ray-71-ticket-171-2013-06-10-9/
For -k 31
/mnt/lustre03/corbeil/corbeil_group/projects/ray-assembler/tickets/171/ray-31-ticket-171-2013-06-10-9/
voir /home/boiseb01/T-132
for this one, I could take the shortest path:
http://genome.ulaval.ca:10241/client/?map=1§ion=0®ion=3&location=0&depth=10
Another example:
/mnt/lustre03/corbeil/corbeil_group/projects/ray-assembler/tickets/188
For MiSeq sample:
http://genome.ulaval.ca:10241/client/?map=2§ion=0®ion=0&location=0&depth=10
RaySeed-0 and RaySeed-3000013 are both forward and 0 is left of 3000013.
My new C++ code should detect that. There are 2 paths between these two. The code should select the one with the most similar coverage.
it seems that data is not being registered with the API...
Don't do anything with these for now:
Side quests:
Some first results:
2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019
http://genome.ulaval.ca:10241/client/?map=3§ion=0®ion=200&location=0&depth=10
2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 17000022 and 24000010
http://genome.ulaval.ca:10241/client/?map=3§ion=0®ion=409&location=168&depth=10
2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018
http://genome.ulaval.ca:10241/client/?map=3§ion=0®ion=200&location=0&depth=10
2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 13000006 and 15000003
http://genome.ulaval.ca:10241/client/?map=3§ion=0®ion=254&location=1734&depth=10
1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000006 and 4000002
1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023
1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 17000022 and 19000020
1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 16000007 and 16000019
Debugging symmetry of computation:
[boisver1@cp1154-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symm|sort |uniq -c| sort -n 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000015 and 6000003 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000021 and 10000012 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 12000007 and 26000010 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000005 and 19000006 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 2000005 and 13000022 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000006 and 4000002 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000012 and 19000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 10000009 and 24000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000003 and 14000020 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 9000003 and 13000000
now it is deterministic. thanks to arbiter
[boisver1@cp1154-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symm|sort |uniq -c| sort -n > run2 [boisver1@cp1154-mp2 Ray-Technology-Research]$ sha1sum run1 run2
(Those are sha1, not commits (stupid auto-formatting LOL))
f8073baab21dede4f1a64e7d320092dc91731a42 run1
f8073baab21dede4f1a64e7d320092dc91731a42 run2
see commit edf2e2ba71adcea1291174463176d25d8fadfb6c
left to do: inspect symmetric relations in Ray Cloud Browser.
After that, it is just a matter of sending all the metadata to the arbiter so that he tell people how to merge seeds.
Symmetry: OK
[boisver1@cp2557-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symmetri|sort |uniq -c 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 10000009 and 24000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000003 and 14000020 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000015 and 6000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000021 and 10000012 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 12000007 and 26000010 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000005 and 19000006 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 2000005 and 13000022 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000006 and 4000002 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000012 and 19000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 9000003 and 13000000
With self detection:
[boisver1@cp2557-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symmetri|sort |uniq -c 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 10000009 and 24000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000003 and 14000020 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 12000007 and 26000010 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 2000005 and 13000022 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019
After filtering out false positives:
[boisver1@cp2557-mp2 Ray-Technology-Research]$ grep symmetric TestX.1.* TestX.1.04:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 TestX.1.11:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011
I need to visualize these:
Rank 0
[DEBUG] gossip:414 [74000017:1] 1 [19000004:0] [DEBUG] gossip:801 [67000009:0] 1 [74000017:0]
[DEBUG] 74000017 is in 324 586
[DEBUG] @324 2 ... [74000017:1] 1 [19000004:0]
[DEBUG] @586 2 ... [67000009:0] 1 [74000017:0]
remote probing
[boiseb01@ls30 Ray-Technology-Research]$ grep symmetric TestX.1.*|grep 90000020 TestX.1.06:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 23000006 and 90000020 TestX.1.10:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 49000010 and 90000020 TestX.1.20:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 23000006 and 90000020 TestX.1.20:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 49000010 and 90000020
synchronization
[DEBUG] Rank rank:0 received gossip gossip:49000010-90000020 from rank rank:10 [DEBUG] Rank rank:0 received gossip gossip:23000006-90000020 from rank rank:6
[DEBUG] Rank rank:0 sent gossip gossip:23000006-90000020 from rank rank:1 ...
[DEBUG] Rank rank:0 sent gossip gossip:49000010-90000020 from rank rank:23 ...
gossips
[DEBUG] gossip:438 [90000020:0] 1 [23000006:0] [DEBUG] gossip:682 [49000010:0] 1 [90000020:1]
solutions
[DEBUG] @171 3 ... [8000005:0] 1 [49000010:0] 1 [90000020:1] [DEBUG] @336 3 ... [90000020:0] 1 [23000006:0] 1 [49000017:0]
analysis
[DEBUG] 90000020 is in 171 336
===> it is probably the way in which object records are prepared that is wrong
It works !
[boiseb01@ls30 Ray-Technology-Research]$ grep "solution has" TestX.1.* TestX.1.00:[DEBUG] solution has 489 entries ! TestX.1.01:[DEBUG] solution has 489 entries ! TestX.1.02:[DEBUG] solution has 489 entries ! TestX.1.03:[DEBUG] solution has 489 entries ! TestX.1.04:[DEBUG] solution has 489 entries ! TestX.1.05:[DEBUG] solution has 489 entries ! TestX.1.06:[DEBUG] solution has 489 entries ! TestX.1.07:[DEBUG] solution has 489 entries ! TestX.1.08:[DEBUG] solution has 489 entries ! TestX.1.09:[DEBUG] solution has 489 entries ! TestX.1.10:[DEBUG] solution has 489 entries ! TestX.1.11:[DEBUG] solution has 489 entries ! TestX.1.12:[DEBUG] solution has 489 entries ! TestX.1.13:[DEBUG] solution has 489 entries ! TestX.1.14:[DEBUG] solution has 489 entries ! TestX.1.15:[DEBUG] solution has 489 entries ! TestX.1.16:[DEBUG] solution has 489 entries ! TestX.1.17:[DEBUG] solution has 489 entries ! TestX.1.18:[DEBUG] solution has 489 entries ! TestX.1.19:[DEBUG] solution has 489 entries ! TestX.1.20:[DEBUG] solution has 489 entries ! TestX.1.21:[DEBUG] solution has 489 entries ! TestX.1.22:[DEBUG] solution has 489 entries ! TestX.1.23:[DEBUG] solution has 489 entries !
see http://genome.ulaval.ca:10090/client/?map=1§ion=0®ion=3158&location=0&depth=2