wguo-research / scCancer

A package for automated processing of single cell RNA-seq data in cancer
99 stars 42 forks source link

How the gene signatures for TME cell types are curated? #5

Closed Puriney closed 4 years ago

Puriney commented 4 years ago

I was wondering how you decide the gene signatures for the TME cell types? For example, which paper(s)? which dataset(s)?

I did not find a clear answer in the manuscript as quoted:

We curated a high-quality dataset by combining multiple cancer scRNA-seq data and trained one-class logistic regression (OCLR) machine learning models for different cell types (Sokolov et al., 2016).

The paper cited was about the regression model but did not inform the multiple datasets used.

Related code: https://github.com/wguo-research/scCancer/blob/c6078eacc2bf1ad958886c39b15e50fee92df7e6/R/scAnnotation.R#L598

wguo-research commented 4 years ago

The default cell type templates were trained by our own data. You can also train your interested cell type templates by OCLR models and input them to the argument "ct.templates".

Puriney commented 4 years ago

Did you use purified cells for scRNA-seq to train the model? Do you use any public data? I did not see on which data cell type templates are based; this is why I open this issue.

wguo-research commented 4 years ago

The OCLR model is supervised, so if you want to train a new template, you need to prepare additional training data. The expression data we used to train the default cell type templates haven't been published. Only the trained cell type templates are included in the package.

Puriney commented 4 years ago

Thank you.

Puriney commented 4 years ago

Can I suggest your team come up with the process of how training the model, e.g., which sc/bulk RNA-seq for T cells are used to train with OCLR? The scCancer's cell annotation depends on this 'prior knowledge'. The pipeline itself can be a 'black-box' but the prior biological knowledge is better not.

wguo-research commented 4 years ago

You can use your own data to train the templates. The training scripts can refer to this: gelnet(train.data, NULL, 0, 1)