hbctraining / scRNA-seq_online

https://hbctraining.github.io/scRNA-seq_online/.
493 stars 175 forks source link

Issues with the scRNAseq workshop material and my own data - integration #91

Closed Hannah-Doerpholz closed 1 year ago

Hannah-Doerpholz commented 1 year ago

I tried following your workshop with my own scRNAseq data. So far, everything worked but I received an error message when I got to the integration steps. After splitting my data into both samples and performing the SCTransform (without regressing anything out) I tried to use the command SelectIntegrationFeatures. While it worked when I followed the tutorial with your data, it didn't work with mine, as I got the following error:

error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic

I can however use the command before doing the SCTransform. Does this pose a problem for the downstream analysis?

But even when I use both commands in this order, the next command PrepSCTIntegration also doesn't work for me:

Error in scale.data[anchor.features, ] : index out of bounds Additional: warning: The following requested features are not present in any models: \ [... cut off]

I do not understand how this error came to pass since it didn't happen when I used the data from your tutorial. Does someone have an idea what might be wrong?

Additional information:
OS: Ubuntu 20.04
R: 4.2.0
Rstudio: 1.1.456 (I know it doesn't work well for showing images, but since I am using a conda env this is my only choice)
all packages were installed as instructed
my dataset as split seurat object: 
2 samples: Rep1: RNA assay: 47438 features x 4409 cells
                 SCT assay: 23210 features x 4409 cells
           Rep2: RNA assay: 47438 features x 6225 cells
                 SCT assay: 24228 features x 6225 cells

All steps from the all previous scRNAseq workshop tutorials have been performed, except anything with mitochondrial counts (as I have no information on this for my dataset) and with cell cycle (also no information available)

eberdan commented 1 year ago

Hi Hannah,

It looks like what you are giving R is a list https://www.statology.org/how-to-fix-in-r-error-in-sort-intx-na-last-decreasing-x-must-be-atomic/

What was your command when you got this error?

HBC Training Team

Hannah-Doerpholz commented 1 year ago

Good evening, thank you for your answer. The command I used was the following:

integ_features <- SelectIntegrationFeatures(object.list = split_seurat, nfeatures = 3000)

I checked "split_seurat" with the typeof() function which of course yielded "list" as an answer. In this list are two seurat elements, one for each replicate from my dataset with two assays each, the "RNA" assay and the "SCT" assay. I find this particularly strange since the code works for your example dataset but when I load my own data and run the same commands this function throws an the error.

eberdan commented 1 year ago

Hi Hannah,

Can you give the output of "traceback" run directly after you get your error?

HBC Training Team

Hannah-Doerpholz commented 1 year ago

Sorry for the late answer. The traceback after running the command and getting the error is the following:

> integ_features <- SelectIntegrationFeatures(object.list = split_seurat, nfeatures = 3000) > traceback() 6: stop("'x' must be atomic") 5: sort.int(x, na.last = na.last, decreasing = decreasing, ...) 4: sort.default(x = tie.ranks) 3: sort(x = tie.ranks) 2: head(x = sort(x = tie.ranks), nfeatures - length(x = features)) 1: SelectIntegrationFeatures(object.list = split_seurat, nfeatures = 3000)

I also noticed while going through the script again that during the RunPCA(seurat_phase) something strange seems to be happening. While in most cases the genes selected for all 5 PCs have the correct label, there are several unknown names (e.g. "X34.58") that do not originate from my matrix file. I don't know where these supposed genes come and I'm not sure if this will also pose a problem later on.

eberdan commented 1 year ago

Hi Hannah,

First you should check that the assay is the same for both of your objects in the split seurat object ("SCT"). You can check this with the following commands:

DefaultAssay(split_seurat$Rep1)

DefaultAssay(split_seurat$Rep2)

You can reset the assays for these using the same command:

DefaultAssay(split_seurat$Rep1) <- "SCT"

If this does not fix the problem you should take a look at your feature names and make sure there are no dashes or underscores. Your second concern indicates that the feature names are likely the problem here. The X at the beginning of the features is coming because R is renaming columns that are not in the correct format. In this case it looks like they are starting with a number which is a big no for R. I would go back and see what your feature names look like right out of cell ranger and how they are being named.

HBC Training Team

Hannah-Doerpholz commented 1 year ago

The DefaultAssay for both replicates was set to SCT. However, your second instructions regarding the feature names helped me. There was indeed a problem with my matrix files and the way I imported them. After fixing the issue both commands "SelectIntegrationFeatures" and the following "PrepSCTIntegration" could run without an error. Hopefully I will be able to run the rest of your tutorial with my data without anymore problems now. Thank you so much for helping me find the problem!