lsteuernagel / mapscvi

Map single cell expression data into a reference scvi latent space and reference umap using R and Seurat. Use included objects to map new data to HypoMap.
Other
1 stars 3 forks source link

Data preprocessing/normalization before mapping #5

Closed LisaSikkema closed 1 year ago

LisaSikkema commented 1 year ago

Hi,

First: thanks for building your atlas, I really like your work! I would like to try to map some data to your HypoMap atlas and went through the notebook with an example scArches mapping, but cannot find any information on how the data was or should be preprocessed. Did you use raw counts for the original SCVI integration? Both the reference and the query data in the notebook seem to be normalized, which if I'm correct is usually not the right input data for scVI/scArches.

Thanks for your help!

lsteuernagel commented 1 year ago

Hi Lisa,

Thanks !

Did you use raw counts for the original SCVI integration? Both the reference and the query data in the notebook seem to be normalized, which if I'm correct is usually not the right input data for scVI/scArches.

I think using normalized counts in the query was a mistake in the python notebook, I have updated the example data and cleaned up the notebook. For the reference training I used raw counts and you should be able to find them in the raw.X slot of the anndata object. The R functions also uses raw data.

For pre-processing I would definitely recommend using the raw counts plus some general dataset QC like doublet detection beforehand. The model expects a batch variable "Batch_ID" in .obs, either just set that to one value or your batches/samples if that makes sense for your data.

Best,

Lukas

LisaSikkema commented 1 year ago

Yes this is great, thanks a lot!