Integration across SmartSeq2 and 10X Datasets

rtdavis7559 commented 4 years ago

I am currently trying to run both your scTransform and standard workflow to integrate some datasets that were generated with SmartSeq2 and 10X platforms. For the SmartSeq2 datasets, what form should I upload the gene expression data as (TPM, FPKM, etc.) when I create the Seurat objects that need to be integrated? For the 10X data it should just be the raw counts matrix correct? Thank you in advance for your assistance.

timoast commented 4 years ago

SCTransform requires that you have counts rather than TPM or FPKM values, so shouldn't be run on SmartSeq2 data. You can create the seurat object using the TPM values and use the standard workflow, and for the 10x data use the UMI counts as documented in the vignettes

rtdavis7559 commented 4 years ago

I am a little confused by this explanation, as in your vignettes for SCTransform you demonstrate integration of SmartSeq2 datasets (https://satijalab.org/seurat/v3.1/integration.html), which use TPM values. How were those able to be integrated then with SCTransform in the pancreas datasets?

ccruizm commented 4 years ago

I have the same question! I want to use some public datasets and some have the raw counts but others only provide TPM values. What is the best approach to integrate them? Thanks in advance!

adomingues commented 4 years ago

I am also confused by this. I was hoping to integrate TPM values with raw counts (all public) and I am unsure if that is even possible. Did you find a solution @ccruizm @rtdavis7559?

Cheers

ccruizm commented 4 years ago

It seems possible @adomingues. I checked the data from the tutorial where they applied SCTransform to smart-seq2 data (TPM) and does not seems to affect the integration. I did it as well and got good results. However, some steps (such as DE analysis) should be performed on the RNA assay, not on the SCT one and there is where the TPM values would conflict with the raw counts (in my opinion)

adomingues commented 4 years ago

Cheers @ccruizm, sadly for me:

However, some steps (such as DE analysis) should be performed on the RNA assay, not on the SCT one and there is where the TPM values would conflict with the raw counts (in my opinion)

This is exactly what I want to do downstream: DE between specific cell clusters coming from the integrated sets. The usual story, one study (TPM) has population B and another study (counts) A, and I would like to find the marker genes that distinguish A form B. If anyone has a good idea of how to achieve this I would welcome suggestions.

ashokpatowary commented 3 years ago

@adomingues Did you manage to figure out a solution for the issue? I am also facing same issue of TPM and raw count integration with an aim of identifying distinct markers in the downstream analysis.

adomingues commented 3 years ago

Hi @ashokpatowary, long story short, I don't remember and no longer have access to the project notes. iirc I end up requesting the raw data of the TPM study and my colleagues re-analyzed it to get counts.

Perhaps not the answer you are looking for, but sometimes there is no substitute for raw data :(

annavpo commented 1 year ago

@rtdavis7559 @adomingues Hi all I have a similar problem. I have multiple 10X data for which I have got the raw counts and 2 Smart-seq2 datasets for which I have TPM. I am not sure if I should find the raw counts(RSEM) for the Smart-seq2 data in order to integrate them with the 10X data. what do you think is the best approach here? I am not sure if I should use the raw (RSEM) counts from Smart-seq2 since these are also different from the raw 10X counts ( SS2 is full-length which means that TPM makes sure that we have corrected for the gene lengths, an issue that we don't have with 10X prime end data). thoughts?

adomingues commented 1 year ago

@annavpo reviewing the thread I would suggest getting the raw counts for the Smart-seq2 datasets. Mostly because from raw you can always calculate TPM / SCT / whatever if needed, which iirc is not the case if you have TPMs.

Teich2233 commented 1 year ago

I have the same question! Does anyone have a solution?

Paperplane1031 commented 4 months ago

@adomingues Did you manage to figure out a solution for the issue? I am also facing same issue of TPM and raw count integration with an aim of identifying distinct markers in the downstream analysis.

I have the same question too. Could anyone help?

satijalab / seurat

Integration across SmartSeq2 and 10X Datasets #2854