are we normalizing too many times?

mistrm82 commented 4 years ago

A couple of issues with the SCT lesson.

First, we use the full dataset (merged object with both samples) to check for cellcycle effects. After evaluation, we then decide that we do not need to regress it out. We then perform SCT on the full dataset. Was this all done just for example purposes?

Because next we split the samples into separate objects and run the for loop:

    split_seurat[[i]] <- NormalizeData(split_seurat[[i]], verbose = TRUE)
    split_seurat[[i]] <- CellCycleScoring(split_seurat[[i]], g2m.features=g2m_genes, s.features=s_genes)
    split_seurat[[i]] <- SCTransform(split_seurat[[i]], vars.to.regress = c("mitoRatio"))
    }

if the code described before this is simply for example, we should specify that
Should CellCycle effects be assessed per sample? If so, then it might not be a good idea to run this loop. SCT arguments vars.to.regress should be evaluated for each individual case and then run
If it's fine to assess cell cycle effects (and any other sources of unwanted variation) across all samples and if we are running Cell CycleScoring inside the loop simply to have the columns in our metadata post-integration, can we instead do the following:
- normalize filtered_seurat
- cellcycle scoring
- PCA
- split this object to run each through SCT (assuming the cellcycle scores columns will persist in the split objects)
- integrate

marypiper commented 3 years ago

Generally, I run through the dataset without splitting it to see whether I need to integrate. So if I don't need to integrate, I wouldn't split the object. Therefore, we go through first with the whole dataset, and if we integrate, it would inform whether we regress cell cycle out upon integration.

I am not sure whether we can do cell cycle scoring on the full dataset then split for SCTransform. I imagine that I did not think this was possible before, but we should test it out and see.

mistrm82 commented 3 years ago

Yes, we can do this. In the materials we use seurat_phase to evaluate cell cycle effects. But when we split the object we use filtered_seurat.

I used seurat_phase and it works out fine, we still have all the required metadata and we are not running uneccessary code.

`split_seurat <- SplitObject(seurat_phase, split.by = "sample") split_seurat <- split_seurat[c("ctrl", "stim")]

options(future.globals.maxSize = 4000 * 1024^2)

for (i in 1:length(split_seurat)) { split_seurat[[i]] <- SCTransform(split_seurat[[i]], vars.to.regress = c("mitoRatio")) } `

We don't even really need the for loop, but we could keep it to show them that this would be useful for datasets with larger number of samples.

mistrm82 commented 3 years ago

Updated code

hbctraining / scRNA-seq_online

are we normalizing too many times? #25