ksahlin / isONcorrect

Error correction of ONT transcript reads
GNU General Public License v3.0
58 stars 9 forks source link

Wrong assignment of Intervals to Reads #5

Closed aljpetri closed 3 years ago

aljpetri commented 3 years ago

When running IsONcorrect on the RBMY dataset, I printed the data given in _intervals_tocorrect . For read_id=2 the datastructure has this interval: (869, 894, 60, array('I', [2, 860, 894, 1, 860, 894, 3, 860, 894, 4, 860, 894, 5, 860, 894, 6, 859, 893, 7, 860, 894, 8, 860, 894, 9, 682, 716, 10, 749, 783, 11, 860, 894, 13, 860, 894, 14, 860, 894, 15, 860, 894, 16, 860, 894, 17, 682, 716, 18, 860, 894, 19, 682, 716, 20, 682, 716, 21, 860, 894, 22, 860, 894, 23, 860, 894, 24, 682, 716, 25, 860, 894, 26, 860, 894, 27, 860, 894, 28, 860, 894, 29, 860, 894, 30, 860, 894, 31, 860, 894, 32, 682, 716, 33, 859, 893, 34, 860, 894, 35, 860, 894, 36, 860, 894, 37, 860, 894, 38, 860, 894, 39, 860, 894, 40, 860, 894, 41, 749, 783, 42, 860, 894, 43, 846, 880, 44, 682, 716, 45, 860, 894, 46, 860, 894, 47, 860, 894, 48, 860, 894, 49, 682, 716, 50, 682, 716, 51, 846, 880, 52, 860, 894, 53, 860, 894, 54, 682, 716, 55, 682, 716, 56, 860, 894, 57, 860, 894, 58, 682, 716, 59, 682, 716, 60, 860, 894, 61, 860, 894])), From my understanding the bold part in this output indicates, that read 1 contains the interval at position 860 to 894. This, however, is not correct, as read_id 1 has intervals from 860 to 873 and from 873 to 894. It therefore seems, that the Interval 860,894 is incorrectly assigned to read 1.

ksahlin commented 3 years ago

I am going to assume that you mean that read1 and read2 come from the same transcript.

In this case, read 2 may end up with different intervals in the WIS solution compared to read 1. For example, read 2 may have had an error/mutation in the minimizer at position 873. However, the interval (860, 894) may exist in the set of all intervals in both reads (i.e., before WIS solution for each read is found). It means that read 1 and 2 simply obtained different WIS solutions which happens, e.g., due to errors in reads. So this is not an error in isONcorrect.