gustaveroussy / FG-Lab

8 stars 5 forks source link

Mismatches between metadata in MNP-VERSE seurat and Mulder et al. Methods and S1 table #4

Open ShaiberAlon opened 2 years ago

ShaiberAlon commented 2 years ago

Thanks for publishing your seurat object!

I took a look at the MNP-VERSE seurat object and I noticed the following things:

  1. In the paper's methods section you described that 6 studies were excluded:

    To allow for analyses across all datasets, a transformed matrix was generated using datasets that contain more than 10,000 common genes inclusive of FOLR2. Datasets of Cheng et al. (2018), James et al. (2020), MacParland et al. (2018), Stewart et al. (2019), Vieira Braga et al. (2019), Xue et al. (2019), and Zheng et al. (2017) were excluded as they did not meet the above criteria (Figure S1).

But when looking at the seurat object then I see all of these studies are included. Instead, the following two studies are not: Dutertre et al., and Zilionis et al.

  1. The number of cells in the supplementary table S1 from the Mulder et al. paper does not seem to match for certain studies to the actual number of cells per study in the seurat object. For example, for Zheng 10x the supplementary table lists 9652, but the seurat object has 9651 (I know it is just a difference of 1, but I just wonder why the mismatch). Same thing goes for Maier et al. (6683 in the seurat and 6684 in the S1 table). Lastly there is also a minor mismatch between the number of cells listed for Mulder, Patel, Kong, Piot Lung (206 in S1, 204 in seurat).

  2. The S1 table includes 4 Mulder, Patel, Kong, Piot datasets representing the following tissue: Liver, Spleen, Lung, and tonsils, but does not include Blood. In the seurat object there are two studies (Study.No 2 and 6) for Mulder, Patel, Kong, Piot et al. Study 6 includes cells from the four tissue types mentioned above (Liver, Spleen, Lung, and tonsils), while study 2 includes cells that according the seurat object metadata come from blood. While looking at this I also noticed that the methods section in the Mulder et al. publication does not specify the origin of the blood samples (even though they are mentioned in the results section:

The populations were validated by analyzing independently our in-house-indexed-SMARTseq2 data that included 5 different tissues (tonsil, spleen, blood, lung, and liver).

  1. Lastly, I wanted to bring to your attention that the "No" column in the S1 table and the "Study.No" column in the seurat metadata do not match. I know that no one promised that they would match, but it would have been really helpful if they did :-)

My apologies for the long review of these minor mismatches. It would be really helpful to get some clarifications so I get a better understanding of the data. Thanks!

ShaiberAlon commented 2 years ago

In case this is useful for anyone else, I have matched the study numbers between table S1 of Mulder et al. with the numbers in the MNP-verse publicly available seurat object: mnp_verse_datasets.txt

This table matches the S1 table, but with the following extra columns:

pubmed
Study No in MNP seurat RDS
Study name in MNP seurat RDS
Tissue included in the MNP seurat object
Excluded in Mulder et al
Notes

The "Excluded in Mulder et al" column is based on studies that in the Mulder et al. publication in the "Generation of transformed matrix" of the Methods section are described to have been excluded.

Notice that the table I shared as an extra row as compared to S1, since I could not match Study No 2 in the seurat object to any of the studies in the S1 table (I also could not find any description of this study in the Mulder et al. paper).