satijalab / azimuth

A Shiny web app for mapping datasets using Seurat v4
https://satijalab.org/azimuth
GNU General Public License v3.0
106 stars 30 forks source link

Clarification in the PBMC reference (SeuratData vs Zenodo vs Azimuth website) #231

Open batalha23 opened 2 months ago

batalha23 commented 2 months ago

First of all, thank you for this tool! I was about to test annotating my query PBMC datasets with Azimuth's reference when I noticed what appeared (to me) to be inconsistencies in the datasets referred as "pbmcref" throughout the tutorials and repositories, could you please clarify if I'm interpreting this wrong?

refAzimuth <- readRDS("~/R/scHER2/data/ref.Rds") View(refAzimuth) refAzimuth An object of class Seurat 5228 features across 36433 samples within 2 assays Active assay: refAssay (5000 features, 0 variable features) 1 layer present: data 1 other assay present: ADT 2 dimensional reductions calculated: refUMAP, refDR

image

With all this in mind, could you please clarify if these datasets are supposed to be the same or if it's a bug that needs to be fixed? Also, if they are indeed distinct datasets, which of them contains the single-cell RNA and ADT data generated in the paper (Hao and Hao et al, Cell 2021)?

Thank you very much in advance!

yi6kim commented 2 weeks ago

I have a similar issue here. I didn't use Zenodo to download the data, but instead, I used InstallData('pbmcref') after loading the library(SeuratData).

When I type in 'thepbmc <- LoadData("pbmcref", "azimuth")' to check its dimensions, it shows there are only 5000 genes and 36,433 cells, as opposed to 161,764 cells that it's supposed to have.

Using LoadData("pbmcref") and LoadData("pbmcref.SeuratData") both give me an error, so I've been using LoadData("pbmcref", "azimuth") to view the dataset, as advised here: https://github.com/satijalab/seurat-data/issues/77.

Screenshot 2024-08-22 at 5 17 02 PM

I also see that AvailableData() states that 'pbmcref.SeuratData' has 2700 cells, but probably this is a typo.

Screenshot 2024-08-22 at 4 35 06 PM Screenshot 2024-08-22 at 4 35 27 PM Screenshot 2024-08-22 at 4 35 46 PM
michael-kotliar commented 23 hours ago

I believe, you can just download the original dataset from here https://atlas.fredhutch.org/nygc/multimodal-pbmc/ and then run the script from here https://github.com/satijalab/azimuth-references/blob/master/human_pbmc/scripts/export.R Just make sure that your mapping.cells and plotting.cells include all cells (not a subset)