xjtu-omics / msisensor-pro

Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Other
93 stars 20 forks source link

What kind of data to build the baseline? and How to deal with Panel data ? #15

Closed eilenilec closed 3 years ago

eilenilec commented 3 years ago

Hello,

Thanks for your tool which looks very great!

I'm wondering what kind of data is needed to build a good baseline in order to perform tumor-only analyses? In your wiki, you said that 20 normal samples are needed for this baseline. Is it better to build a baseline with normal samples from patients with a particular tumor type or whatever? For example, if we have 20 normal samples from patients with CRC and 20 normal samples from patients with prostate cancer, is it better to build two different baselines (one to use for MSI detection in CRC tumor samples and one to use with prostate tumor samples) and should we build only one baseline with 40 normal samples from patients with both cancer types?

Another question is: How to deal with cancer panel data? The difference is it only at the last step where we can provide a bed file indicating the targeted genes? Should we provide a file with microsatellite positions?

Thanks a lot!

PengJia6 commented 3 years ago

Hi,

Thanks for your question!

In MSIsensor-pro, in order to ignore the influence of genetic background and sequencing batch, we use DNA sequencing data which has the same background and same sequencing technology to build baseline for following analysis. More specifically, you need some aligned DNA sequencing files to do this.

About for baseline building, I think both methods are fine. But in our test, we found that it is slightly better to establish a baseline for each tumor.

About for panel data analysis, you don't need to do anything extra.

If you have any questions, please feel free to open an issue or contact with me (pengjia@stu.xjtu.edu.cn)!

eilenilec commented 3 years ago

Sounds good, thanks for you very quick reply!