theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
147 stars 24 forks source link

Is the input data expected by scCODA the size of the cell type for each sample? #60

Closed altairwei closed 2 years ago

altairwei commented 2 years ago

From the tutorial and the example of from_pandas, it appears that scCODA actually uses input data in the form of cell type sizes rather than single cell UMIs data. What the function from_scanpy does is simply calculate the size of the cell type in each sample, right?

            Mouse  Endocrine  Enterocyte  Enterocyte.Progenitor  Goblet  Stem    TA  TA.Early  Tuft
0       Control_1         36          59                    136      36   239   125       191    18
1       Control_2          5          46                     23      20    50    11        40     5
2       Control_3         45          98                    188     124   250   155       365    33
3       Control_4         26         221                    198      36   131   130       196     4
4  H.poly.Day10_1         42          71                    203     147   271   109       180   146
5  H.poly.Day10_2         40          57                    383     170   321   244       256    71
6   H.poly.Day3_1         52          75                    347      66   323   263       313    51
7   H.poly.Day3_2         65         126                    115      33    65    39       129    59
8          Salm_1         37         332                    113      59    90    47       132    10
9          Salm_2         32         373                    116      67   117    65       168    12

The values in the table above represent the count of cells contained in a particular cell type in a given sample, correct?

johannesostner commented 2 years ago

Hi @altairwei, exactly! scCODA operates on a (sample x cell type) level. The from_scanpy function just counts the amount of cells of each type in each sample in a (cell x gene)-level anndata object. Your intuition was totally correct!