Closed ilibarra closed 2 years ago
The theoretical limit for the number of elements in a normal R matrix is 2147483647 so I'm guessing that might be the issue. There are probably ways around it with sparse matrices or disk-backed stuff but your HVG idea is probably the quickest/easiest.
Thanks for the input! I am now keeping only HVGs by subseting them before calling the saveSeurat
function.
Now, while trying to save I am getting the following error.
invalid class “dgCMatrix” object: row indices are not sorted within columns
I think this time it is related to sorting in the indices of the anndata matrices object. I have verified that before conversion following the tip in https://github.com/theislab/anndata2ri/pull/52. Currently the conversion error is the same.
adata = sc.read(inPath)
adata.X.has_sorted_indices
adata.X.sort_indices()
adata.X.has_sorted_indices
it prints
0 # sorting not done
True # sorting done.
Before calling the saveSeurat
function.
adata_hvg = adata[:,adata.var.index.isin(hvgs)]
print('has sorted indexes?', adata_hvg.X.has_sorted_indices)
scIB.preprocessing.saveSeurat(adata_hvg, outPath, batch, hvgs)
it prints
1
Type of adata.X is <class 'anndata._core.views.SparseCSRView'>
. Mentioning this in case this could be related to the conversion problem.
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-111-605d72dcdc60> in <module>
26 print('has raw', adata_hvg.raw)
27
---> 28 scIB.preprocessing.saveSeurat(adata_hvg, outPath, batch, hvgs)
29 # scIB.preprocessing.saveSeurat(adata, outPath, batch, hvgs)
30
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/scIB/preprocessing.py in saveSeurat(adata, path, batch, hvgs)
486 adata.layers[key].sort_indices()
487
--> 488 ro.globalenv['adata'] = adata
489
490 ro.r('sobj = as.Seurat(adata, counts="counts", data = "X")')
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/rpy2/robjects/environments.py in __setitem__(self, item, value)
30
31 def __setitem__(self, item, value) -> None:
---> 32 robj = conversion.converter.py2rpy(value)
33 super(Environment, self).__setitem__(item, robj)
34
~/miniconda3/envs/scIB-python/lib/python3.7/functools.py in wrapper(*args, **kw)
838 '1 positional argument')
839
--> 840 return dispatch(args[0].__class__)(*args, **kw)
841
842 funcname = getattr(func, '__name__', 'singledispatch function')
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/anndata2ri/py2r.py in py2rpy_anndata(obj)
54 # TODO: sparse
55 x = {} if obj.X is None else dict(X=mat_converter.py2rpy(obj.X.T))
---> 56 layers = {k: mat_converter.py2rpy(v.T) for k, v in obj.layers.items()}
57 assays = ListVector({**x, **layers})
58
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/anndata2ri/py2r.py in <dictcomp>(.0)
54 # TODO: sparse
55 x = {} if obj.X is None else dict(X=mat_converter.py2rpy(obj.X.T))
---> 56 layers = {k: mat_converter.py2rpy(v.T) for k, v in obj.layers.items()}
57 assays = ListVector({**x, **layers})
58
~/miniconda3/envs/scIB-python/lib/python3.7/functools.py in wrapper(*args, **kw)
838 '1 positional argument')
839
--> 840 return dispatch(args[0].__class__)(*args, **kw)
841
842 funcname = getattr(func, '__name__', 'singledispatch function')
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/anndata2ri/scipy2ri/py2r.py in wrapper(obj)
41
42 with localconverter(default_converter + numpy2ri.converter):
---> 43 return f(obj)
44
45 return wrapper
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/anndata2ri/scipy2ri/py2r.py in csc_to_rmat(csc)
55 p=as_integer(csc.indptr),
56 x=conv_data(csc.data),
---> 57 Dim=as_integer(list(csc.shape)),
58 )
59
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
190 kwargs[r_k] = v
191 return (super(SignatureTranslatedFunction, self)
--> 192 .__call__(*args, **kwargs))
193
194
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
119 else:
120 new_kwargs[k] = conversion.py2rpy(v)
--> 121 res = super(Function, self).__call__(*new_args, **new_kwargs)
122 res = conversion.rpy2py(res)
123 return res
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py in _(*args, **kwargs)
26 def _cdata_res_to_rinterface(function):
27 def _(*args, **kwargs):
---> 28 cdata = function(*args, **kwargs)
29 # TODO: test cdata is of the expected CType
30 return _cdata_to_rinterface(cdata)
~/miniconda3/envs/scIB-python/lib/python3.7/site-packages/rpy2/rinterface.py in __call__(self, *args, **kwargs)
783 error_occured))
784 if error_occured[0]:
--> 785 raise embedded.RRuntimeError(_rinterface._geterrmessage())
786 return res
787
RRuntimeError: Error in validObject(.Object) :
invalid class “dgCMatrix” object: row indices are not sorted within columns
Calls: <Anonymous> ... initialize -> callNextMethod -> .nextMethod -> validObject
Happy to clarify more if required. I hope description useful.
Hey!
Sorting indices is already part of the saveSeurat
function: https://github.com/theislab/scib/blob/985d8155391fdfbddec024de428308b5a57ee280/scib/preprocessing.py#L485-L492
Maybe the issue is that your adata
is a view? Could you run an adata = adata.copy()
before converting?
Based on last comment, .copy()
seems to be the solution, and not sorting indices.
The following snippet makes the saving of R with an fixed number of hvgs possible
adata_hvg = adata[:,adata.var.index.isin(hvgs)] # hvgs are defined in the early steps of the runPP.py script.
adata_hvg = adata_hvg.copy() # update `adata` in case a view.
scIB.preprocessing.saveSeurat(adata_hvg, outPath, batch, hvgs)
Thanks!
Great! Is this also an issue in the scib-pipeline
scripts, or is this your own code? If it's in the pipeline scripts, do you want to make a PR?
Running R methods through scIB with a large h5ad file fails at the earliest steps (~2 million cells). The error I get seems to be at the
integration_prepare
step, scriptrunPP.py
due to a parsing of the anndata object into the R environment.From OverflowError: integer 3068178618 does not fit 'int' I suspect this could be an overflow R error that stops the conversion.
The current AnnData object is
n_obs × n_vars = 1907304 × 14260
.adata.X
is a sparse matrix of float32. The number 3068178618 is the shape of the non-zero entries in the sparseadata.X
. I removed theadata.layers['counts']
to attempt saving space but still no success. I think this is just a borderline R session memory error.Any hints on whether this can be solved by increasing memory requirements for R, or bypassed by running externally? I am using the version from Apr 14, so it was maybe handled in recent versions. As this could be a borderline case, maybe wondering if one can solve this by only saving HVGs and only scIB pipeline variants with HVGs.
Thanks for any input on this topic.