Open ekg opened 1 year ago
Are we ready to merge? I guess we should check larger scale tests.
In terms of performance, it is good. I tried it on multiple chrXY graphs of primates. The slowest run took ~6 hours.
octopus11
[odgi::untangle] warning: no step index specified. Building one with a sample rate of 8. This may take additional time. A step index can be provided via -a, --step-index. A step index can be built using odgi stepindex.
[odgi::algorithms::stepindex] Collecting Steps Progress: 100.00% @ 3.97e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00
[odgi::algorithms::stepindex] Building Progress: 100.00% @ 2.89e+00/s elapsed: 00:00:00:04 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries with 1 targets
[odgi::algorithms::untangle] establishing initial cuts for 14 paths
[odgi::algorithms::untangle] set target nodes 100.00% @ 5.29e-01/s elapsed: 00:00:00:01 remain: 00:00:00:00
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 1.42e-03/s elapsed: 00:00:11:44 remain: 00:00:00:00
[odgi::algorithms::untangle] mark nodes every 50000 bp 100.00% @ 3.17e+07/s elapsed: 00:00:00:01 remain: 00:00:00:00
[odgi::algorithms::untangle] add new cut points 100.00% @ 4.08e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00
[odgi::algorithms::untangle] building target segment index
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 6.20e-04/s elapsed: 00:00:26:53 remain: 00:00:00:00
[odgi::algorithms::untangle] prepare segment cuts 100.00% @ 4.38e-01/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] make the mapping 100.00% @ 3.73e+07/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries 100.00% @ 7.26e-04/s elapsed: 00:05:21:32 remain: 00:00:00:00
Command being timed: "/home/guarracino/tools/odgi/bin/odgi-547e4d76f340dc0da34bae1522308e9b97efd0c9 untangle -i /lizardfs/guarracino/tree_of_life_alignment/graphs/primates14.chrXY.p70.s5000/primates14.chrXY.fa.gz.174e029.417fcdf.a903a91.smooth.final.og -r mGorGor1#chrY -e 50000 -m 1000 -j 0 -n 28 -t 14 -P"
User time (seconds): 75629.28
System time (seconds): 140.30
Percent of CPU this job got: 349%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:01:10
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 25976056
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 42828892
Voluntary context switches: 41562533
Involuntary context switches: 82776
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I wonder if we'd go faster without the sampled step index.
On Thu, Feb 9, 2023 at 2:08 PM Andrea Guarracino @.***> wrote:
In terms of performance, it is good. I tried it on multiple chrXY graphs of primates. The slowest run took ~6 hours.
octopus11 [odgi::untangle] warning: no step index specified. Building one with a sample rate of 8. This may take additional time. A step index can be provided via -a, --step-index. A step index can be built using odgi stepindex. [odgi::algorithms::stepindex] Collecting Steps Progress: 100.00% @ 3.97e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00 [odgi::algorithms::stepindex] Building Progress: 100.00% @ 2.89e+00/s elapsed: 00:00:00:04 remain: 00:00:00:00 [odgi::algorithms::untangle] untangling 14 queries with 1 targets [odgi::algorithms::untangle] establishing initial cuts for 14 paths [odgi::algorithms::untangle] set target nodes 100.00% @ 5.29e-01/s elapsed: 00:00:00:01 remain: 00:00:00:00 [odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 1.42e-03/s elapsed: 00:00:11:44 remain: 00:00:00:00 [odgi::algorithms::untangle] mark nodes every 50000 bp 100.00% @ 3.17e+07/s elapsed: 00:00:00:01 remain: 00:00:00:00 [odgi::algorithms::untangle] add new cut points 100.00% @ 4.08e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00 [odgi::algorithms::untangle] building target segment index [odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 6.20e-04/s elapsed: 00:00:26:53 remain: 00:00:00:00 [odgi::algorithms::untangle] prepare segment cuts 100.00% @ 4.38e-01/s elapsed: 00:00:00:02 remain: 00:00:00:00 [odgi::algorithms::untangle] make the mapping 100.00% @ 3.73e+07/s elapsed: 00:00:00:00 remain: 00:00:00:00 [odgi::algorithms::untangle] untangling 14 queries 100.00% @ 7.26e-04/s elapsed: 00:05:21:32 remain: 00:00:00:00 Command being timed: "/home/guarracino/tools/odgi/bin/odgi-547e4d76f340dc0da34bae1522308e9b97efd0c9 untangle -i /lizardfs/guarracino/tree_of_life_alignment/graphs/primates14.chrXY.p70.s5000/primates14.chrXY.fa.gz.174e029.417fcdf.a903a91.smooth.final.og -r mGorGor1#chrY -e 50000 -m 1000 -j 0 -n 28 -t 14 -P" User time (seconds): 75629.28 System time (seconds): 140.30 Percent of CPU this job got: 349% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:01:10 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 25976056 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 42828892 Voluntary context switches: 41562533 Involuntary context switches: 82776 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/pull/474#issuecomment-1424746622, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEOC2XUCQ72DFCKGVGDWWVFE7ANCNFSM6AAAAAAUTG4ITM . You are receiving this because you authored the thread.Message ID: @.***>
There is still a problem with the reverse orientation. This is from the injection tutorial, but without the odgi flip
step:
master on the left, this branch on the right
The tracks in reverse are split into more arrows. Applying odgi flip
, the output becomes identical.
Actually, there are also a few problems in forward.
A bit better, but still more arrows than the master (on the left):
And also different best targets (different colors).
Almost 20 hours. Not 10 minutes as before the last fixing commit, but neither days of waiting as with the master.
octopus03
[odgi::untangle] warning: no step index specified. Building one with a sample rate of 8. This may take additional time. A step index can be provided via -a, --step-index. A step index can be built using odgi stepindex.
[odgi::algorithms::stepindex] Collecting Steps Progress: 100.00% @ 4.59e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00
[odgi::algorithms::stepindex] Building Progress: 100.00% @ 3.41e+00/s elapsed: 00:00:00:04 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries with 1 targets
[odgi::algorithms::untangle] establishing initial cuts for 14 paths
[odgi::algorithms::untangle] set target nodes 100.00% @ 4.79e-01/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 2.75e-05/s elapsed: 00:10:06:48 remain: 00:00:00:00
[odgi::algorithms::untangle] mark nodes every 50000 bp 100.00% @ 2.58e+07/s elapsed: 00:00:00:01 remain: 00:00:00:00
[odgi::algorithms::untangle] add new cut points 100.00% @ 6.22e+00/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] building target segment index
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 1.02e-03/s elapsed: 00:00:16:16 remain: 00:00:00:00
[odgi::algorithms::untangle] prepare segment cuts 100.00% @ 4.79e-01/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] make the mapping 100.00% @ 2.44e+07/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries 100.00% @ 4.42e-04/s elapsed: 00:08:48:01 remain: 00:00:00:00
Command being timed: "/home/guarracino/tools/odgi/bin/odgi-2c78159a1b4bf122493075e436ea9c53033f430f untangle -i /lizardfs/guarracino/tree_of_life_alignment/graphs/primates14.chrXY.p90.s20000/primates14.chrXY.fa.gz.e4728fd.417fcdf.8abe853.smooth.final.og -r chm13v2#chrX -e 50000 -m 1000 -j 0 -n 28 -t 14 -P"
User time (seconds): 349928.09
System time (seconds): 282.15
Percent of CPU this job got: 506%
Elapsed (wall clock) time (h:mm:ss or m:ss): 19:11:58
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 22090900
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 328758411
Voluntary context switches: 82374953
Involuntary context switches: 544237
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Yup this is still not done.
This is looking more correct. I'm not seeing the same problems. Looking to see if there are further wins to be made.
That's the premise of this PR. It seems that untangling isn't behaving as expected in deep loops, and this branch will have hacks required to figure out why. It may be due to a quadratic expansion of the number of possible cut points.
The current patch makes a change that does affect the untangle cut points. We need to validate that the output makes sense.