theislab / scib-pipeline

Snakemake pipeline that works with the scIB package to benchmark data integration methods.
MIT License
64 stars 28 forks source link

Error in `[[<-`(`*tmp*`, assay, value = assay.data) : [[<- defined for objects of type "S4" only for subclasses of environment #48

Open AlinaKurjan opened 1 year ago

AlinaKurjan commented 1 year ago

Hello again!

Getting the following issue running R methods:

Error in `[[<-`(`*tmp*`, assay, value = assay.data) : 
  [[<- defined for objects of type "S4" only for subclasses of environment
Calls: runHarm -> ScaleData -> ScaleData.Seurat
Execution halted

Example errors:

Error in rule integration_run_r:
    jobid: 44
    output: [path...]/devcombined/integration/scaled/full_feature/R/harmony.RDS
    shell:
        conda run --no-capture-output -n scib-R4.0 Rscript scripts/integration/runMethods.R -i [path...]/devcombined/prepare/scaled/full_feature/adata_pre.RDS -o [path...]/devcombined/integration/scaled/full_feature/R/harmony.RDS -b libbatch --method harmony 

Error in rule integration_run_r:
    jobid: 50
    output: [path...]/devcombined/integration/unscaled/hvg/R/seurat.RDS
    shell:
        conda run --no-capture-output -n scib-R4.0 Rscript scripts/integration/runMethods.R -i [path...]/devcombined/prepare/unscaled$ method seurat -v "[path...]/devcombined/prepare/unscaled/hvg/adata_pre_hvg.RDS"

Any thoughts on what could be causing it?

AlinaKurjan commented 1 year ago

If that helps, the issue started occurring after I've implemented a few R packages for QC, namely DropletUtils::emptyDrops & scDblFinder. Could it be an issue with h5ad -> anndata -> R (sce) -> anndata (my workflow) -> R (your pipeline) conversion?

LuckyMD commented 1 year ago

Do you have the error from the slurm job log you were running? This looks like the snakemake log only.

AlinaKurjan commented 1 year ago

That's the only error explanation I get as far as I can see:

Error in `[[<-`(`*tmp*`, dimreduc, value = tryCatch(expr = subset.DimReduc(x = x[[dimreduc]],  : 
  [[<- defined for objects of type "S4" only for subclasses of environment
Calls: subset -> subset.Seurat
Execution halted
ERROR conda.cli.main_run:execute(49): `conda run Rscript scripts/integration/runMethods.R -i [path...]/devcombined/prepare/scaled/hvg/adata_pre.RDS -o [path...]/devcombined/integration/scaled/hvg/R/fastmnn.RDS -b libbatch --method fastmnn -v [path...]/devcombined/prepare/scaled/hvg/adata_pre_hvg.RDS` failed. (See above for error)
[Fri Oct 28 11:15:49 2022]
Error in rule integration_run_r:
    jobid: 32
    output: [path...]/devcombined/integration/scaled/hvg/R/fastmnn.RDS
    shell:
        conda run --no-capture-output -n scib-R4.0 Rscript scripts/integration/runMethods.R -i [path...]/devcombined/prepare/scaled/hvg/adata_pre.RDS -o [path...]/devcombined/integration/scaled/hvg/R/fastmnn.RDS -b libbatch             --method fastmnn -v "[path...]/devcombined/prepare/scaled/hvg/adata_pre_hvg.RDS"
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
LuckyMD commented 1 year ago

The error suggests that you have a non S4 object for which assignment doesn't work. I haven't come across this before.

Based on the snippets that you show this comes up for harmony, fastmnn, and Seurat? Or is the error you're sharing in the second post from fastmnn, but the above one from a harmony run? They don't seem to match.

If this is consistent across R methods and assignment is indeed not working, that suggests it might be the anndata2ri conversion to the SingleCellExperiment object.

Could you load your data and then just try:

import anndata2ri
import rpy2.robjects as ro
import scanpy as sc

anndata2ri.activate()

adata = sc.read('my/data/object')
ro.globalenv['adata'] = adata
AlinaKurjan commented 1 year ago

It is consistent across all R methods. I think I started getting this error after the pipeline was updated with scvi-tools, but it could also be because of changes to my earlier workflow steps (see below). Running your code:

AttributeError: module 'scanpy' has no attribute 'load' if I load adata with sc.read_h5ad instead then it seems to work fine...

My package versions in the scib-pipeline-R4.0:

anndata2ri     1.1
rpy2     3.5.1
anndata    0.8.0

Scanpy: scanpy 1.9.1

Any ideas?


Had a look at my conversions from anndata to sce earlier in the workflow. Here are a few points that could potentially affect things: 1) to get the anndata2ri conversion to work, I had to change my adata.X from the default 'numpy.uint32' (I use output h5ad from CellBender) to 'numpy.float64' format like so: adata.X = adata.X.astype(np.float64).copy() (I don't know if that was a stupid thing to do, please let me know if it was)

2) I've tried both a) anndata2ri.activate() and then data_mat = adata.X.T and ro.globalenv['data_mat'] = data_mat and b) anndata2ri.scipy.activate() together with data_mat = adata.X.T and anndata2ri.scipy2ri.py2rpy(data_mat). Both seemed to work fine, however I kept running into into an issue with sce object. I had to keep recreating the sce object with sce = SingleCellExperiment(list(counts=data_mat)) every time I wanted to do something with sce (e.g. emptyDrops and scDblFinder) because when it was freshly created, it had "counts" as the assay, however if I saved this sce object to the environment and called it again in the next chunk (or used %%R -o sce and then %%R -i sce in the next chunk), it always changed the assay name to "X" ... that was the only noticeable issue.

3) later on (for scran normalization), I used

ro.pandas2ri.activate()
anndata2ri.activate()

data_mat = adata_pp.X.T
if scipy.sparse.issparse(data_mat):
    if data_mat.nnz > 2**31 - 1:
        data_mat = data_mat.tocoo()
    else:
        data_mat = data_mat.tocsc()
ro.globalenv["data_mat"] = data_mat
ro.globalenv["input_groups"] = adata_pp.obs["groups"]
%%R -o size_factors

size_factors = sizeFactors(
    computeSumFactors(
        SingleCellExperiment(
            list(counts=data_mat)), 
            clusters = input_groups,
            min.mean = 0.1,
            BPPARAM = MulticoreParam()
    )
)
adata.obs["size_factors"] = size_factors
scran = adata.X / adata.obs["size_factors"].values[:, None]
adata.layers["scran_normalization"] = csr_matrix(sc.pp.log1p(scran))

I don't have nearly as much experience converting things and running R code this way so please let me know if you think there could be issues with any of my approaches here. That's all I can think of...

LuckyMD commented 1 year ago

Hah, yes... ofc. I don't know where my head was... sc.read() I meant (edited above now).

LuckyMD commented 1 year ago

So this can't have been the result of the scvi-tools update... i reckon it probably has something to do with the conversion or saving of data objects in R. I'm not 100% sure about the explicit conversion of data matrix formats. Specifically the csr_matrix(sc.pp.log1p(scran)) command at the end... but if you did this and it converted to SCE well, then this shouldn't be the case.

The uint conversion was important as R doesn't have a uint data type. You probably could have also just an integer type rather than a float to make it a bit more lightweight, but I'm not sure how assignment of a corrected/normalized/scaled matrix then works (if it's tied to the initial data type or not).

Your conversions are equivalent with anndata2ri as the .activate() function uses scipy.activate() in the backend along with pandas conversion functions. I think you may have to run the different steps in runMethods.R interactively to figure out where the issue is...

boyangzhang1993 commented 11 months ago

I found the same issue. Any update for this issue? Thanks.