natir / yacrd

Yet Another Chimeric Read Detector
MIT License
72 stars 8 forks source link

Choice of minimap version when running yacrd for chimeric read removal #57

Open Rohit-Satyam opened 1 week ago

Rohit-Satyam commented 1 week ago

Dear Developers

I see that you have requested to use an older version of minimap2 for Yacrd given some changes in minimap that doesn't sit well with Yacrd. I did a small exploration using prescribed version 2.18 and newest version 2.28-r1209 on Dengue Samples sequenced on Minion as follows:

minimap2 -x ava-ont $p $p -t 120 > overlap.paf; 
n=$(basename $p | cut -f 1 -d '.'); 
yacrd -t 120 -i overlap.paf -o ${n}_oldminimap.yacrd

And I see the following:

Screenshot from 2024-11-05 16-11-49

Test_case.xlsx

Using old minimap2 version i.e. 2.18 , yacrd detects marginally more chimeric reads as compared to the new one (see figure above). Just to make sure if the chimeric reads detected by minimap2_old+Yacrd and minimap2_new+Yacrd are totally different or borderline different, I check the overlap of the read headers that Yacrd output as Chimeric.

When I take the read headers and check the overlap of the reads detected to be chimeric between the two, most of the read headers of chimeric read from new minimap overlaps with the results from the old minimap hence informing that the chimeric reads detected by newer minimap2+yacrd are not that different relatively except few reads. We choose 3 samples that had highest no. of chimeric reads here to see which reads were not being tagged by Yacrd when using it with new minimap2 (see file attached Reads_unique.txt). Reads_unique.txt

To see if the reads detected chimeric in both cases can be distinguished on some grounds such as MAPQ, we viewed their alignments in IGV with DENV2 and saw that reads from both cases have a portion (~500 bp) that maps to DENV2 and rest is soft clipped and the MAPQ ranges from 13 to 60. Soft clipping of major portion of reads (~50% or greater) tempts us to believe they are chimeric. But this still didn't answer why some reads escape chimera detection when using new minimap2+Yacrd and some reads are only tagged to be chimeric when using old minimap2_yacrd.

Did you test Yacrd with the new version and try to see if the seed related issue has been resolved or still persist? Also, should I use -c option in minimap2 as suggested by one user here

natir commented 6 days ago

Hi thank again for your interest.

After rereading, my Readme I think I'm too dramatic.

With your results it seems to me that you can use the latest version of minimap2 without any problem.


About the -c parameter for minimap2, it actually improves the mapping quality but also drastically increase the computation time. In order to generate the CIGAR string, exact mapping (dynamic programming stuff) must be performed, without it there's only the search and chainings of seeds.

I don't think the benefit is worth it. It should only modify a little bit the limit of the poor quality regions.