tomazc / iCount

iCount, protein-RNA interaction analytics
http://icount.biolab.si
Other
23 stars 26 forks source link

Could iCount xlsites support soft-clipping? #215

Open TomSmithCGAT opened 6 months ago

TomSmithCGAT commented 6 months ago

When using an aligner which is allowed to soft-clip the reads, this can error out with an index error at poss[idx] as len(read.query_length()) used to define idx will be greater than len(read.get_reference_positions(). Would it be possible for iCount to support use of aligners with soft-clipping?

https://github.com/tomazc/iCount/blob/4260bae82ed495445b1ef6461137566ceab6e238/iCount/mapping/xlsites.py#L471-L492

As an example, BAM read:

NS500105:608:HHF3NBGXN:4:22601:5900:10521:N:0:GTCCGCrbc:TTAAGAC 147 chr1 189874 0 46S30M = 189693 -211 CACGACATCCTCCTCCCAGTCGCCCCTGTAGCTCCCCTACCTCCAAGAGGGTGTGGGATGGTGGAGGGGTTTGAGA EEEAEA6EAAAAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEE/EEEAEEAEEEEEEEEEAEEEEEEEEAAA/A AS:i:60 XS:i:98 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:30 YS:i:96 YT:Z:CP NH:i:2

poss: [189873, 189874, 189875, 189876, 189877, 189878, 189879, 189880, 189881, 189882, 189883, 189884, 189885, 189886, 189887, 189888, 189889, 189890, 189891, 189892, 189893, 189894, 189895, 189896, 189897, 189898, 189899, 189900, 189901, 189902] (length=30) idx:37