satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 910 forks source link

Error when trying to deconvolute bulk-RNA-seq data (Unsure if Seurat error) #8075

Closed kvn95ss closed 11 months ago

kvn95ss commented 11 months ago

We have bulk-RNA seq and would like to identify the cell types present in the samples. We found and followed this protocol - https://star-protocols.cell.com/protocols/1386 and the R code can be found here - https://data.mendeley.com/datasets/nkrfxtbrmc/2

We tried the protocol and the protocol worked, however when we tried it with our own data set, we ran into an error at step

correlations_DEGs_log <- cor(method = "pearson", log2(t(as.matrix(sc_data@assays[["RNA"]]@data[topDEGs_list,]))+1))

where we got the error

Error in as.matrix(sc_data@assays[["RNA"]]@data[topDEGs_list, ]) : 
  no slot of name "data" for this object of class "Assay5"

We are not too sure if the error is due to any other step in the protocol, or due to the Seurat package itself! Any help is greatly appreciated.

shahrozeabbas commented 11 months ago

It seems you are using Seuratv5. You could try changing sc_data@assays[["RNA"]]@data to LayerData(object=sc_data, assay='RNA', layer='data') and it should work fine.

The idea is that you need to extract the normalized counts from the seurat object. The line of code you have provided is written to obtain these counts from Seurat version 4 or less. Most likely, the structure of the object has changed in Seuratv5 and the method to obtain the counts the same way no longer works.

kvn95ss commented 11 months ago

Hello, thanks for the input!

I still am not able to get it working, the list of genes are present in topDGEs_list, I modified it as follows - LayerData(object = sc_data, assay='RNA', layer='data[topDEGs_list,]'

But got the error -

Warning message:
Layer ‘data[topDEGs_list,]’ is empty

topDEGs_list is just a list of Gene names.

kvn95ss commented 11 months ago

Never mind, changed it to LayerData(object = sc_data, assay='RNA', layer='data', however it is running for quite some time, is this normal?

kvn95ss commented 11 months ago

I let it run for some hours and got the following warning -

Warning message:
In cor(method = "pearson", log2(t(as.matrix(LayerData(object = sc_data,  :
  the standard deviation is zero

I'm trying to redo the entire analysis (Was working with saved files) and see if it changes the output. Will the warning cause issues?

shahrozeabbas commented 11 months ago

The normalized data in the data layer is already log transformed as long as you call NormalizeData. I suggest removing the log2 call. You should be okay attempting to calculate pearson correlation on the normalized data as is. The warning is most likely coming from genes with low variance.

As far as runtime, I'm not sure about the package you're using and can't speak as to how long it'll take. Most likely it'll depend on the size of your dataset. I see now that the original command is subsetting for only DEGs in which case your command would be as follows:

sc_data <- Seurat::NormalizeData(sc_data)
norm_counts <- t(LayerData(object=sc_data, assay='RNA', layer='data')[topDEGs_list, ])
correlations <- cor(method='pearson', x=norm_counts)
kvn95ss commented 11 months ago

I included as.matrix within t() in norm_counts, and it now works. Will test it out with the example data set and try to get it working with my data.

Thanks a lot for your help, much appreciated!