satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 909 forks source link

removing doublets? #3266

Closed Sa753 closed 4 years ago

Sa753 commented 4 years ago

How can I remover doublet in a subset of Seurat object?. I use subset function to generate a smaller seurat object from SCTransform integrated big seurat object. How can I remove doublets from this and which assay should I use "RNA", "SCT", or "integrated" assay?.

samuel-marsh commented 4 years ago

Hi,

Are you asking how to remove doublets that you have already identified or how to identify doublets?

Best, Sam

Sa753 commented 4 years ago

HI, If I have a seurat object subset of B cells , I wouldn't expect to see CD3 or CD14 expression? so how can I identify them and how can I remove them?. for example can I use featureplot or should I run FindAllMarker function to get all DE genes and then remove the cells expressing CD3 or Cd14?. which assay should I use while removing the cells?.

samuel-marsh commented 4 years ago

Hi,

So there are many options and it is up to you to decide what the best scenario is for removing doublets in your individual dataset. You can perform clustering and if you see any clusters made up exclusively of T cell or monocyte markers then remove them with object_filtered <- subset(x = object, idents = "T Cells", invert = TRUE)

You could also simply remove any cells the express a marker above a certain level. object_filtered <- subset(x = object, subset = "CD3E" > EXP_VALUE, invert = TRUE)

Your choice of EXP_VALUE may change based on which assay you choose but the principle remains the same. Hope that is helpful.

Best, Sam

Sa753 commented 4 years ago

Hi Sam,

Thank you so much for your reply.

One thing I don't understand which is "Your choice of EXP_VALUE may change based on which assay you choose but the principle remains the same"?. How will that vary and why? can you give an example please?. I don't know how to determine this value. could you please tell me how to determine it? Thanks so much

samuel-marsh commented 4 years ago

Hi,

That relates to basics of those assays and how they differ from each other so I suggest checking that info out. But for quick reference you can look at the difference in the plotted values in the SCTransform vs. standard vignettes in the featureplot data. In terms of determining that value that is really up to the end user. There is no standard that is necessary applicable across all datasets so you will need to decide what your criteria for a potential doublet is.

Best, Sam

Sa753 commented 4 years ago

Hi Sam, Thanks for this. If I don't want to have TCR genes affecting my clustering, is it correct to remove them completely from the matrix?. and if so, when should I remove them, i.e , before the integration or after I do the integration ?. Thanks

samuel-marsh commented 4 years ago

Hi,

So removing genes is different from removing cells. Yes you can remove genes if you really don't want them in the object. But if you simply don't want the TCR genes to effect clustering then you can simply supply the gene list of TCR genes in your annotation similar to examples in the vignettes for mitochondrial percentage and regress them out so they don't effect clustering. i.e. pbmc <- ScaleData(pbmc, vars.to.regress = "percent.mt") or pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)

Sa753 commented 4 years ago

Hi ,

Thanks for this. but I don't want them even in the list of Differentially expressed genes so shall I remove them before or after the integration? Thanks

samuel-marsh commented 4 years ago

That is up to you though if you doublet filter your dataset and it's just B cells then there shouldn't be any TCRs right? Personally, I don't like removing genes from datasets as long as they meet initial expression in the min.cells parameter. Others may have difference of opinion here but that's just mine. There may be some ambient RNA that maps to TCR that makes it into droplets but in that case it likely won't be differentially expressed.

Specifically for FindMarkers during DE you can also just supply a features list that is all genes minus TCRs if it's still problem.

satijalab commented 4 years ago

Thanks Sam. I'll just add a link to the Doublet Finder package from the Gartner lab, which is compatible with Seurat https://github.com/chris-mcginnis-ucsf/DoubletFinder

Sa753 commented 4 years ago

Hi, I am not sure why did you close that. Thanks for your help but I am still not clear about how to determine the exprssion value that I should add in the subset function and which assay should I be using?.

I already use doublet finder and remove the Doublet_hi cells. but when I run FindAllMarker function, I can still see the expression of genes that I wouldn't expect to see them together in the same cluster such as CD3 and CD68 for example which to me would be doublets. So how can I remove them and which expression value should I use?.

Thanks

samuel-marsh commented 4 years ago

Hi,

So again this isn't something that has necessarily one answer and needs to be evaluated on the basis of each dataset and experiment so I don't think I can effectively tell you a specific value.

In terms of what you are seeing with CD68 and CD3 yes those could be doublets, or they could be T cells with ambient RNA for CD68 in the droplet (or vice versa moncyte/DC with CD3 RNA in droplet). I would encourage you to look at other public datasets that also show this (one example https://singlecell.broadinstitute.org/single_cell/study/SCP345/ica-blood-mononuclear-cells-2-donors-2-sites#study-visualize. Single cell datasets are never going to be 100% pure. You have to evaluate multiple factors and make experimental decision as to what you call a single cell and what you call a doublet.

As this is diverging into much more of theoretical single cell question vs technical software github issue you may want to post on more bioinformatics-focused forum like biostars or bioinformatics stackexchange. I would also recommend checking out the OSCA guide doublet section which has some additional info https://osca.bioconductor.org/doublet-detection.html#overview-5.

Best, Sam

laijen000 commented 3 years ago

Hi @satijalab , I have used DoubletFinder to classify cells as doublets or singlets for individual samples that I would like to integrate in Seurat. I was wondering if in terms of general workflow, would you recommend subsetting the individual seurat objects to remove the doublets prior to integration? Alternatively is there a different step when you would recommend removing them (such as after integration but prior to finding cluster markers)? Thank you!

denvercal1234GitHub commented 3 years ago

@laijen000 - Did you figure out the recommended workflow for doing DoubletFinder and Seurat SCTransform workflow?

laminbcham commented 2 years ago

I have already ran the doublet finders and there are some doublets in my integrated sample. How can i remove all the doublets? and save all the remaining singlet in an RDS file?

Quick help please

Anan2022 commented 2 years ago

Hi,

I would like to use emptydroplet and scDblFinder functions to remove the empty dropletss and doublets. These functions need object of single cell experiment but not Seurat. I would like to ask a question here. How can I change object of single cell experiment to that of Seurat so that I can continue to do the further analysis?

samuel-marsh commented 2 years ago

Hi @Anan2022 see: https://satijalab.org/seurat/articles/conversion_vignette.html

Anan2022 commented 2 years ago

Many

Hi @Anan2022 see: https://satijalab.org/seurat/articles/conversion_vignette.html Many thanks for your reply. I tried to do the as.seurat function but it went wrong (Error: No data in provided assay - logcounts). Can I find any tutorial from doublets removal to Find ALL Markers (Seurat). I am a bit confused on the tutorial you sent to me. because the demo dataset can't be download and thus it is a bit difficult to understand the component of the object.

samuel-marsh commented 2 years ago

Hi,

That is unfortunate. Though you can simply convert the Seurat object to SCE to see format before converting back to Seurat to test functions are working properly.

Can you clarify are you trying to convert a SCE object to Seurat to do doublet detection or the other way around?

In terms of tutorial I would recommend looking at those doublet tools for tutorials because they are not part of Seurat directly and therefore best to use what is provided by those developers.

Best, Sam

Anan2022 commented 2 years ago

Hi

Basically, I used to analyse single-cell RNA sequence data by Seurat. However, it seems like only DoubletDecon I know can be used in Seurat object to remove doublets. But unfortunately, my Mac cannot install this package. Alternatively, I tried to use scDblFinder to remove doublets. In addition, I also do emptyDrops to remove the empty droplets. These two functions can only be used in SCE object. after empty droplets and doublets are removed, how can I construct a new Seurat from the SCE object? there should be new Seurat object with removed empty droplets and doublets to do further analysis (including cell annotation, and other advanced analysis). Although I searched a lot online, I still cannot find a solution.

Best wishes Anan

Anan2022 commented 2 years ago

Hi,

doublets tools are not part of Seurat, so do you have any other tools like Seurat to analyse the single cell RNA sequence data after removing empty droplets and doublets?

Best wishes Anan