theislab / scib

Benchmarking analysis of data integration tools
MIT License
294 stars 63 forks source link

how to merge and integrate the TPM and raw counts matrix data #387

Open Teich2233 opened 1 year ago

Teich2233 commented 1 year ago

Hello,

I have downloaded the scRNA-seq data from the articles for analysis. Some articles only provided TPM data (The TPM data obtained by RSEM is performed on smart-seq2 data), while others only provided raw counts matrix data (10x genomics). How do I merge and integrate the scRNA-seq data from these articles?

Thanks

mumichae commented 6 months ago

Hi, this is indeed a harder problem for integration, since full-length read counts and UMI counts differ, even after normalisation. In order to have a fairer comparison, I'd normalise the UMI counts (from 10x) and use them together with the TPM counts for integration, PCA and other downstream tasks. You might need to be cautious with count-based methods such as scVI and scANVI, since they require unnormalised counts. @LuckyMD Any suggestions on this?