Closed bitcometz closed 3 years ago
Thank you for your interest in ItClust. Regarding your questions:
ItClust has its own method to determine how many HVG to use. Briefly, more cells in the data, more HVGs used. We have a detailed description in the Method section of the paper.
I think this depends on your goal. In ItClust, it learns how many cell types are presented in the training dataset, and then tries to identify these cell types in the target data. Of course, you can modify the cell types in the training data(for example, specify T cells into CD8+T, CD4+T, T helper) to make ItClust learn different information.
You can combine multiple datasets into one training data to include comprehensive cell types. In our paper, we have tested ItClust in this scenario and the performance is pretty good. One thing to notice is that, for cell types presented in multiple datasets, I would prefer using them from only one dataset to avoid the batch effect. For example, dataset1 has cell types A, B and C, dataset2 has cell types C and D. I would exclude cell type C from dataset2 before combining.
hi,all,if you don’t mind, I hope to join your discussion. Regarding the third point, I see a related description in your article, but I have a question: If you exclude other data of the same cell type data, will it limit the data size of the training set and the richness of the data set. Then weakened the performance of the model. Is it possible to take the published method(such as Seurat) to remove the batch effect, and then use a large amount of data for training, because generally speaking, the larger the training set, the stronger the predictive ability of the model.
This is just my thoughts, commons are welcome.
Thanks !!!
Hi pigraul,
I think data size is not a big issue here. Hundreds of cells for each cell type are enough for training.
Regarding batch effect removal, we have 2 steps to do so:
You can definitely use other methods to integrate data, but I do not recommend Seurat. Seurat uses CCA to remove batch effects between datasets, and to my experience, it is always over corrected. CCA also removes the difference between cell types when removing batch effect, which will hurt the downstream analysis, e.g. clustering.
hi, @jianhuupenn , thanks to your reply !!!
From the third point of reply above, for this software, it is not like other machine learning methods, and there is no need to collect a lot of data to make a training set.
Best
hello, ItClust is a powerfull tool. I think that it needs to work with a large enough training database. Do you have any suggestions to build the train data? For example,
Thanks!!!
Best