xushiabbvie / TDtool

6 stars 0 forks source link

Inclusion of criteria of genes in the modeling #5

Closed yichao-cai closed 1 week ago

yichao-cai commented 1 week ago

Hi thank you for bringing this very interesting tool/framework in bridging TCGA and DepMap data to the community!

Can I get your help in better understanding the consideration/rational of including/exclusing certain genes in your modeling?

According to the method of your paper:

Because many genes do not impact cell viability (CERES < −0.5), elastic-net models were attempted only for genes with at least five dependent and nondependent cell lines, which included 7,260 out of 18,119 genes (40%) with effects scores in the DEPMAP (1Q21 release)

My questions would be:

  1. May I know is there any considerations that CERES score are used instead of Chronos score from DepMap?
  2. And why -0.5 is used as the cutoff to determine whether a gene impact cell viability or not? According to DepMap, 0 and -1 is recommended to be used as cutoff of not essential, and essential, respectively.
  3. Are there any specific considerations that you only include genes with >=5 dependent and non-dependent cell lines? Since in the final results you have only elastic net models on 40% of all the coding genes, which a lot of important cancer genes are excluded from this and subsequent downstream analysis.
    • Is it because you are looking for genes with bigger effect size (have both dependent and non-dependent cell lines)
    • If you apply the elastic net model on the genes with only dependent or non-dependent cell lines, how would the results look like or changed? Or is there any concerns/assumptions that I am missing here.

Your insight and input are highly valued and appreciated! Thanks a lot!

xushiabbvie commented 1 week ago

Hi, thank you for trying our method! For your questions:

  1. We started generating the TCGADEPMAP when 21Q1 data just released. At that time, CERES score was the only score available from DepMap. For the future release, we can use Chronos instead of CERES for prediction.
  2. It is related to the first question. -0.5 is the cutoff suggested by DepMap when we built the TCGADEPMAP.
  3. The goal of building TCGADEPMAP is to prioritize potential threpeutical targets. Common essential and non-essentail genes are not suitable to be used as targets since they are either toxic or having no effect at all. Therefore, we first filtered out those genes before training the model. Also, genes with larger effect size as you mentioned will be better to build models. We haven't tested the performance of building models with only dependent or non-dependent cell lines since one of our goals is to identify dependent samples from all samples. For example, the patients with high ERBB2 expression will be more suitable to HER2 treatment. If you build a model with only ERBB2 dependent cell lines, the ERBB2 expression will always be high. Then it is likely ERBB2 expression will not be included in the model, which is not suitable to explain the MOA of the treatment and also identify potential targets for patients.

Hope this helps. Thanks.