sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

implement a seed merger strategy. #188

Closed sebhtml closed 11 years ago

sebhtml commented 11 years ago

see http://genome.ulaval.ca:10090/client/?map=1&section=0&region=3158&location=0&depth=2

sebhtml commented 11 years ago

With some sequencing technologies, seeds are separated by vertices that have a large family (many parents, many children).

Any two seeds A and B where B is the only child seed of A and A is the only parent seed of B have to be merged.

This will be fast as it does not require any read information. It is only topological.

Do we see this also when the kmer length is only 31 ?

For -k 71

/mnt/lustre03/corbeil/corbeil_group/projects/ray-assembler/tickets/171/ray-71-ticket-171-2013-06-10-9/

For -k 31

/mnt/lustre03/corbeil/corbeil_group/projects/ray-assembler/tickets/171/ray-31-ticket-171-2013-06-10-9/

sebhtml commented 11 years ago

voir /home/boiseb01/T-132

sebhtml commented 11 years ago

It also occurs for -k 31

http://genome.ulaval.ca:10241/client/?map=0&section=1&region=17&location=31844&depth=8&zoom=1.0372300553911538

sebhtml commented 11 years ago

for this one, I could take the shortest path:

http://genome.ulaval.ca:10241/client/?map=1&section=0&region=3&location=0&depth=10

Another example:

http://genome.ulaval.ca:10241/client/

sebhtml commented 11 years ago

/mnt/lustre03/corbeil/corbeil_group/projects/ray-assembler/tickets/188

sebhtml commented 11 years ago

For MiSeq sample:

http://genome.ulaval.ca:10241/client/?map=2&section=0&region=0&location=0&depth=10

RaySeed-0 and RaySeed-3000013 are both forward and 0 is left of 3000013.

My new C++ code should detect that. There are 2 paths between these two. The code should select the one with the most similar coverage.

sebhtml commented 11 years ago

it seems that data is not being registered with the API...

sebhtml commented 11 years ago

Don't do anything with these for now:

http://genome.ulaval.ca:10241/client/?map=2&section=0&region=466&location=3297&depth=10&zoom=0.8615388061448417

sebhtml commented 11 years ago

Side quests:

sebhtml commented 11 years ago
sebhtml commented 11 years ago

Some first results:

  2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019

http://genome.ulaval.ca:10241/client/?map=3&section=0&region=200&location=0&depth=10

  2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 17000022 and 24000010

http://genome.ulaval.ca:10241/client/?map=3&section=0&region=409&location=168&depth=10

  2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018

http://genome.ulaval.ca:10241/client/?map=3&section=0&region=200&location=0&depth=10

  2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 13000006 and 15000003

http://genome.ulaval.ca:10241/client/?map=3&section=0&region=254&location=1734&depth=10

  1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000006 and 4000002
  1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023
  1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 17000022 and 19000020
  1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 16000007 and 16000019
sebhtml commented 11 years ago

Debugging symmetry of computation:

[boisver1@cp1154-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symm|sort |uniq -c| sort -n 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000015 and 6000003 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000021 and 10000012 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 12000007 and 26000010 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000005 and 19000006 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 2000005 and 13000022 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000006 and 4000002 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019 1 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000012 and 19000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 10000009 and 24000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000003 and 14000020 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 9000003 and 13000000

sebhtml commented 11 years ago

now it is deterministic. thanks to arbiter

[boisver1@cp1154-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symm|sort |uniq -c| sort -n > run2 [boisver1@cp1154-mp2 Ray-Technology-Research]$ sha1sum run1 run2

(Those are sha1, not commits (stupid auto-formatting LOL))

      f8073baab21dede4f1a64e7d320092dc91731a42  run1
      f8073baab21dede4f1a64e7d320092dc91731a42  run2

see commit edf2e2ba71adcea1291174463176d25d8fadfb6c

sebhtml commented 11 years ago

left to do: inspect symmetric relations in Ray Cloud Browser.

After that, it is just a matter of sending all the metadata to the arbiter so that he tell people how to merge seeds.

sebhtml commented 11 years ago

Symmetry: OK

[boisver1@cp2557-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symmetri|sort |uniq -c 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 10000009 and 24000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000003 and 14000020 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000015 and 6000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000021 and 10000012 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 12000007 and 26000010 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000005 and 19000006 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 2000005 and 13000022 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000006 and 4000002 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000012 and 19000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 9000003 and 13000000

sebhtml commented 11 years ago

With self detection:

[boisver1@cp2557-mp2 Ray-Technology-Research]$ cat TestX.1.*|grep symmetri|sort |uniq -c 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 10000009 and 24000003 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 1000003 and 14000020 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 12000007 and 26000010 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 15000021 and 22000018 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 19000000 and 24000023 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 2000005 and 13000022 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 2 [DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 5000010 and 23000019

sebhtml commented 11 years ago

After filtering out false positives:

[boisver1@cp2557-mp2 Ray-Technology-Research]$ grep symmetric TestX.1.* TestX.1.04:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011 TestX.1.11:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 3000004 and 16000011

sebhtml commented 11 years ago

I need to visualize these:

Rank 0

[DEBUG] gossip:414 [74000017:1] 1 [19000004:0] [DEBUG] gossip:801 [67000009:0] 1 [74000017:0]

[DEBUG] 74000017 is in 324 586

[DEBUG] @324 2 ... [74000017:1] 1 [19000004:0]

[DEBUG] @586 2 ... [67000009:0] 1 [74000017:0]

sebhtml commented 11 years ago

remote probing

[boiseb01@ls30 Ray-Technology-Research]$ grep symmetric TestX.1.*|grep 90000020 TestX.1.06:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 23000006 and 90000020 TestX.1.10:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 49000010 and 90000020 TestX.1.20:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 23000006 and 90000020 TestX.1.20:[DEBUG] MODE_CHECK_RESULTS got a symmetric relation between 49000010 and 90000020

synchronization

[DEBUG] Rank rank:0 received gossip gossip:49000010-90000020 from rank rank:10 [DEBUG] Rank rank:0 received gossip gossip:23000006-90000020 from rank rank:6

[DEBUG] Rank rank:0 sent gossip gossip:23000006-90000020 from rank rank:1 ...

[DEBUG] Rank rank:0 sent gossip gossip:49000010-90000020 from rank rank:23 ...

gossips

[DEBUG] gossip:438 [90000020:0] 1 [23000006:0] [DEBUG] gossip:682 [49000010:0] 1 [90000020:1]

solutions

[DEBUG] @171 3 ... [8000005:0] 1 [49000010:0] 1 [90000020:1] [DEBUG] @336 3 ... [90000020:0] 1 [23000006:0] 1 [49000017:0]

analysis

[DEBUG] 90000020 is in 171 336

===> it is probably the way in which object records are prepared that is wrong

sebhtml commented 11 years ago

It works !

[boiseb01@ls30 Ray-Technology-Research]$ grep "solution has" TestX.1.* TestX.1.00:[DEBUG] solution has 489 entries ! TestX.1.01:[DEBUG] solution has 489 entries ! TestX.1.02:[DEBUG] solution has 489 entries ! TestX.1.03:[DEBUG] solution has 489 entries ! TestX.1.04:[DEBUG] solution has 489 entries ! TestX.1.05:[DEBUG] solution has 489 entries ! TestX.1.06:[DEBUG] solution has 489 entries ! TestX.1.07:[DEBUG] solution has 489 entries ! TestX.1.08:[DEBUG] solution has 489 entries ! TestX.1.09:[DEBUG] solution has 489 entries ! TestX.1.10:[DEBUG] solution has 489 entries ! TestX.1.11:[DEBUG] solution has 489 entries ! TestX.1.12:[DEBUG] solution has 489 entries ! TestX.1.13:[DEBUG] solution has 489 entries ! TestX.1.14:[DEBUG] solution has 489 entries ! TestX.1.15:[DEBUG] solution has 489 entries ! TestX.1.16:[DEBUG] solution has 489 entries ! TestX.1.17:[DEBUG] solution has 489 entries ! TestX.1.18:[DEBUG] solution has 489 entries ! TestX.1.19:[DEBUG] solution has 489 entries ! TestX.1.20:[DEBUG] solution has 489 entries ! TestX.1.21:[DEBUG] solution has 489 entries ! TestX.1.22:[DEBUG] solution has 489 entries ! TestX.1.23:[DEBUG] solution has 489 entries !