xjtu-omics / msisensor-pro

Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Other
93 stars 20 forks source link

more documentation on what the outputs mean? #14

Open boyangzhao opened 3 years ago

boyangzhao commented 3 years ago

Hello,

Can you guys provide a bit more info on what the outputs from the msisensor-pro msi results mean? I see there are the prefix, _dis, _germline, __somatic files. It would be helpful if there is some explanation in the documentation? The File Formats page only had a few lines describing the outputs. I'm interested in knowing if the tumor samples I have can be classified into MSI-H or MSS. Is the MSI score the third column % in the prefix file? What is the # of somatic sites? is this the number of detected unstable sites (in somatic, and not found in germline)?

Also, do you need multiple normal samples as baseline, or is this only for tumor-only and you can run with just a single matched tumor-normal pair?

Bo

PengJia6 commented 3 years ago

Hi Bo @boyangzhao ,

I will provide more information about the output in Wiki. The MSI score is the third column(%) in the prefix file. Number_of_Somatic_Sites is this the number of detected unstable sites.

If you have tumor and its matched normal sample, I suggest you use the msi command (msisensor oringal version). If you have only tumor sample, you need build a baseline and run pro command.

Peng

boyangzhao commented 3 years ago

Ah great! Thanks!!

anoronh4 commented 3 years ago

Looking forward to this as well. I am interested in knowing the definitions of each column in the _germline and _somatic files (is difference just the difference of the mean number of repeats?), as how cutoffs are applied for including sites in the analysis. For example i see a more sites in the _dis file that have coverage of >=20 in both tumor and normal, compared to the total number of sites reported in the summary file. the number in the summary file also doesn't match number of sites >=20X in either tumor or normal.

Huanan2018 commented 2 years ago

Hello, I am also interested in knowing the definitions of each column in the _germline and _somatic files. Has this part of the documentation been updated on the wiki? Another question, why is the total number of microsatellite loci in the report not equal to the number of microsatellites in gemerline? Is this process filtered?