snap-stanford / SATURN

MIT License
108 stars 17 forks source link

question in the dataset #73

Closed huawen-poppy closed 3 weeks ago

huawen-poppy commented 1 month ago

Hi, I have a question about the dataset you used.

I downloaded the human dataset from https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5 (specifically I downloaded all tissue). Besides, I also downloaded the provided Tabula Sapiens, Muris and Microcebus Coarse Whole Atlas Alignments and Individual Tissue alignemnts outputs (http://snap.stanford.edu/saturn/data/tabula_mammal_export.tar.gz).

First I subset the human species from the output embedding file (named human_only_saturn_output), then I used the cell id to subset the cells from the published human dataset (named origin_human). However, I found that the tissue information and cell type information do not match between these two files even though the cells are the same. Specifically, there are 10 tissues with 58 cell types from the origin_human dataset. however, there are 9 tissues with 62 cell types from the human_only_saturn_output dataset. Among these two datasets, only 37 cell types overlap. Could you please explain why this happened?

Besides, from the Saturn output embedding file, I can not find any coarse-grained annotation information used in Figure 1b. Could you please provide such information?

Thank you very much!

Yanay1 commented 3 weeks ago

Hi, please see supplementary note 1 for the paper:

for integrating Tabula Sapiens 1, Tabula Microcebus 2 and Tabula Muris 3 we filtered cell
types to select cell types with more than 350 cells. Additionally, we filtered cells with fewer than
500 genes expressed and filtered genes expressed in fewer than 1000 cells