niu-lab / msisensor2

Microsatellite instability (MSI) detection for tumor only data.
GNU General Public License v3.0
94 stars 21 forks source link

get location for analysed sites #16

Open ZKai0801 opened 4 years ago

ZKai0801 commented 4 years ago

Hi @Beifang and @owehann ,

Thank you for developing such a wonderful tool!

I would like to get all locations of analysed sites and I found someone asked the same question here: https://github.com/ding-lab/msisensor/issues/49

I tried to retrieve all counted locations in your mentioned manner (by default i.e. sites where reads depth >= 20), however, it didn't generate correct output as it does not match the number shown on .prefix file.

So I assume MSIsensor and MSIsensor2 used different criteria for selecting readily analysed sites. Is there a way I could retrieve those sites?

Thank you very much!

owehann commented 4 years ago

Thanks for your attention on MSIsensor2 Actually, MSIsensor and MSIsensor2 used the same criteria for selecting microsatellites. You can easily get all analysed sites with their distributions in prefix_dis file. Samtools can also used to get the detail information of a microsatellite in bam files, and the premise is to pay attention that left and right flank must match with the reference.

ZKai0801 commented 4 years ago

Hi @owehann ,

Thank you so much for your reply!

That seems quite odds, because what I did is simply scanning through the prefix_dis file. And whenever I encounter a location contain a number that >= 20, then its cooresponding location will be directed to a output file.

Here is a Python script I used to retrieve sites:

import sys

if __name__ == "__main__":
    with open(sys.argv[1], "r") as fh:
        for line in fh:
            if line.startswith("chr"):
                coordinate_line = line
                continue
            depths = [int(i) for i in line.strip().lstrip("T:").split() if int(i) >= 20]
            if not depths:
                continue
            print(coordinate_line, end="")

To be specific, with the use of this script, sometimes I get less analysed locations than the number shown on prefix file. Is there something wrong with my script? Or would you mind if I sent you an example BAM file?

Thanks for your helping!