Open marcelm opened 1 month ago
I’ve now started to run the same optimization as above on the new "Sim5" dataset (the above was done on the "Sim3" dataset, which has less variation).
Here are the results for the read lengths that have finished.
Readl. | Before | Optimized | maponly SE | maponly PE | extalign SE | extalign PE |
---|---|---|---|---|---|---|
50 | (16, 12, -2, 0) | (16, 12, -2, -1) | +0.2479 | +0.2580 | +0.1281 | +0.1493 |
75 | (20, 16, -3, -1) | (19, 15, -1, -1) | +0.2116 | +0.0701 | +0.0817 | +0.0682 |
100 | (16, 12, 1, 3) | (17, 13, 1, 4) | +0.0891 | +0.0328 | -0.0007 | +0.0070 |
150 | (20, 16, 2, 5) | (20, 16, 2, 6) | +0.0781 | +0.0005 | +0.0301 | +0.0040 |
200 | (24, 20, 4, 12) | (23, 19, 3, 13) | +0.0085 | +0.004 | +0.0351 | +0.0049 |
300 | (24, 20, 5, 13) | (25, 21, 3, 14) | +0.1113 | +0.0232 | +0.0224 | +0.0204 |
500 | (25, 19, 7, 13) | (27, 21, 5, 14) | +0.1121 | +0.0441 | +0.0500 | +0.0065 |
It’s good to have this data now, but the picture becomes less clear. Except for read length 50, the best settings for Sim5 are quite different from the ones for Sim3.
Edit: Table completed for read lengths 200-500.
Yes, we haven't defined a precise objective yet - only that we want 'as good as possible in both scenarios'.
Before that; I am surprised by the relatively small average accuracy gain for maponly SE
and maponly PE
in both the tables above. I have attached accuracy plots for the genomes, SE (hg38) and PE (hg38) data from my benchmark done recently after fixing the mcs implementation.
My benchmark was done with these parameters for mcs, and strobealign_v012_opt
in the plots are the same parameters you use as baseline for our current seeds (https://github.com/ksahlin/strobealign/commit/4c10938ac1e88f90f1585eb10c75d33109a2bd64). Do the parameters I used for mcs show up in your optimisation script? And is so, what are the average gains?
maponly PE
for 50nt read length, and maize and rye around 0.2-0.5pp. Based on this I would estimate an average of about 0.7pp in my experiments.maponly PE
for 50nt read length on SIM4, which is close to SIM5. Sure that its a different genome to CHM13, but still noteworthy.Oh, is the Before
column referring to strobealign with mcs but with parameters from (https://github.com/ksahlin/strobealign/commit/4c10938ac1e88f90f1585eb10c75d33109a2bd64) ? Then I misunderstood.
Regardless, It would still be in interesting if these parameters are ever visited.
Oh, is the Before column referring to strobealign with mcs but with parameters from (https://github.com/ksahlin/strobealign/commit/4c10938ac1e88f90f1585eb10c75d33109a2bd64) ? Then I misunderstood.
Yes, I essentially re-did the optimization for mcs using SIM5 as if we had never done an optimization using SIM3. But I guess both ways are valid? Anyway, I’ll do it the way you thought so you can compare.
Do the parameters I used for mcs show up in your optimisation script? And is so, what are the average gains?
I’ll report the numbers relative to https://github.com/ksahlin/strobealign/commit/d9d5aafc05828150bbb327c5aaaba5df3c136ea8 as soon as I have them.
Anyway, I’ll do it the way you thought so you can compare.
For some reason I thought the Before
column was strobealign-v0.12.0-opt
. I don't know why I misunderstood that though. Since comparing relative improvement to mcs (not to v0.12.0), which you are doing, is more relevant in this issue. I will anyway get the comparison to v0.12.0 in my benchmarks.
I’ll report the numbers relative to https://github.com/ksahlin/strobealign/commit/d9d5aafc05828150bbb327c5aaaba5df3c136ea8 as soon as I have them.
Okay nice - I guess this is more from my curiosity as it will signal how much better/worse the black lines in the plots I attached above will get.
I have filled in the table above. Here is the same table but with numbers relative to d9d5aaf.
Readl. | Before | Optimized | maponly SE | maponly PE | extalign SE | extalign PE |
---|---|---|---|---|---|---|
50 | (17, 13, -2, 0) | (16, 12, -2, -1) | +0.537 | +0.527 | +0.2882 | +0.2225 |
75 | (20, 16, -3, -1) | (19, 15, -1, -1) | +0.2116 | +0.0701 | +0.0817 | +0.0682 |
100 | (18, 14, 1, 3) | (17, 13, 1, 4) | +0.1218 | +0.0639 | +0.0868 | +0.0332 |
150 | (22, 18, 3, 5) | (20, 16, 2, 6) | +0.2318 | +0.1071 | +0.1695 | +0.0003 |
200 | (24, 20, 4, 12) | (23, 19, 3, 13) | +0.0085 | +0.0042 | +0.0351 | +0.0049 |
300 | (24, 20, 5, 13) | (25, 21, 3, 14) | +0.1113 | +0.0232 | +0.0224 | +0.0204 |
500 | (25, 19, 7, 13) | (27, 21, 5, 14) | +0.1121 | +0.0441 | +0.0500 | +0.0065 |
Branch: mcs-optimized-parameters (commit 7fe07b498e274567050cfe888bb91d1752e961c9).
The same command as in #407 was used:
Suggested changes
Using parameters from commit 4c10938ac1e88f90f1585eb10c75d33109a2bd64 as baseline.
Old table using parameters from v0.12 as baseline
Readl. | Before | Optimized | maponly SE | maponly PE | extalign SE | extalign PE -|-|-|-|-|-|- 50 | (18, 14, -2, 1) | (16, 12, -2, -1) | +0.1477 | +0.3609 | +0.2171 | +0.1236 75 | (20, 16, -3, 2) | (22, 18, -2, -1) | +0.3006 | +0.2672 | +0.0232 | +0.1401 100 | (20, 16, -2, 2) | (20, 16, 0, 3) | +0.6108 | +0.3681 | +0.1630 | +0.1153 100 | | (23, 19, 0, 1) | +0.4422 | +0.3745 | +0.0729 | +0.1442 150 | (20, 16, 1, 7) | (23, 19, 4, 7) | +0.1915 | +0.2323 | +0.0257 | +0.0748 150 | | (22, 18, 3, 7) | +0.2030 | +0.1779 | +0.0536 | +0.0679 200 | (22, 18, 2, 12) | (24, 20, 4, 12) | +0.1054 | +0.1024 | +0.0550 | +0.0386 300 | (22, 18, 2, 12) | (24, 20, 6, 13) | +0.2251 | +0.1039 | +0.0769 | +0.0368 500 | (23, 17, 2, 12) | (25, 19, 7, 13) | +0.2818 | +0.1646 | +0.0596 | +0.0119