theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

Error in py_ref_to_r(x) : negative length vectors are not allowed #99

Closed ainefairbrother closed 1 year ago

ainefairbrother commented 1 year ago

When converting an AnnData object to an SCE using zellkonverter::AnnData2SCE, I get the following error:

Error in py_ref_to_r(x) : negative length vectors are not allowed, followed by ✖ Converting AnnData to SingleCellExperiment ... failed.

My code was as follows:

library(SingleCellExperiment)
library(dreamlet)
library(zellkonverter)
library(reticulate)
library(anndata)
library(magrittr)
library(SummarizedExperiment)

adata <- anndata::read_h5ad("my_file.h5ad", backed='r')

sce <- zellkonverter::AnnData2SCE(adata=adata, X_name = "X", layers = TRUE, var = TRUE, obs = TRUE, uns = FALSE, varm = FALSE, obsm = FALSE, varp = FALSE, obsp = FALSE, raw = FALSE, skip_assays = FALSE,  hdf5_backed = FALSE, verbose = TRUE)

Session:

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] anndata_0.7.5.6             reticulate_1.30             zellkonverter_1.10.1        dreamlet_0.99.16            SingleCellExperiment_1.22.0 caret_6.0-94               
 [7] lattice_0.21-8              janitor_2.2.0               Hmisc_5.1-0                 factoextra_1.0.7            vroom_1.6.3                 magrittr_2.0.3             
[13] data.table_1.14.8           patchwork_1.1.2             variancePartition_1.31.9    BiocParallel_1.34.2         limma_3.56.2                ggsci_3.0.0                
[19] DESeq2_1.40.2               SummarizedExperiment_1.30.2 Biobase_2.60.0              MatrixGenerics_1.12.3       matrixStats_1.0.0           GenomicRanges_1.52.0       
[25] GenomeInfoDb_1.36.1         IRanges_2.34.1              S4Vectors_0.38.1            BiocGenerics_0.46.0         lubridate_1.9.2             forcats_1.0.0              
[31] stringr_1.5.0               dplyr_1.1.2                 purrr_1.0.1                 readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1               
[37] tidyverse_2.0.0             ggplot2_3.4.2               here_1.0.1                 

loaded via a namespace (and not attached):
  [1] splines_4.3.0             bitops_1.0-7              filelock_1.0.2            basilisk.utils_1.12.1     hardhat_1.3.0             graph_1.78.0              pROC_1.18.4              
  [8] XML_3.99-0.14             rpart_4.1.19              lifecycle_1.0.3           mixsqp_0.3-48             Rdpack_2.4                edgeR_3.42.4              rprojroot_2.0.3          
 [15] globals_0.16.2            MASS_7.3-60               backports_1.4.1           rmarkdown_2.23            yaml_2.3.7                DBI_1.1.3                 minqa_1.2.5              
 [22] abind_1.4-5               zlibbioc_1.46.0           EnvStats_2.8.0            msigdbr_7.5.1             rmeta_3.0                 RCurl_1.98-1.12           nnet_7.3-19              
 [29] ipred_0.9-14              lava_1.7.2.1              GenomeInfoDbData_1.2.10   ggrepel_0.9.3             pbkrtest_0.5.2            irlba_2.3.5.1             listenv_0.9.0            
 [36] annotate_1.78.0           parallelly_1.36.0         DelayedMatrixStats_1.22.1 codetools_0.2-19          DelayedArray_0.26.6       tidyselect_1.2.0          lme4_1.1-34              
 [43] base64enc_0.1-3           jsonlite_1.8.7            Formula_1.2-5             survival_3.5-5            iterators_1.0.14          foreach_1.5.2             progress_1.2.2           
 [50] tools_4.3.0               zenith_1.2.0              Rcpp_1.0.11               glue_1.6.2                prodlim_2023.03.31        gridExtra_2.3             xfun_0.39                
 [57] withr_2.5.0               numDeriv_2016.8-1.1       fastmap_1.1.1             basilisk_1.12.1           boot_1.3-28.1             fansi_1.0.4               truncnorm_1.0-9          
 [64] caTools_1.18.2            digest_0.6.32             timechange_0.2.0          R6_2.5.1                  colorspace_2.1-0          scattermore_1.2           gtools_3.9.4             
 [71] RSQLite_2.3.1             RhpcBLASctl_0.23-42       utf8_1.2.3                generics_0.1.3            recipes_1.0.6             corpcor_1.6.10            class_7.3-22             
 [78] prettyunits_1.1.1         httr_1.4.6                htmlwidgets_1.6.2         S4Arrays_1.0.5            ModelMetrics_1.2.2.2      pkgconfig_2.0.3           gtable_0.3.3             
 [85] timeDate_4022.108         blob_1.2.4                XVector_0.40.0            remaCor_0.0.16            htmltools_0.5.5           GSEABase_1.62.0           scales_1.2.1             
 [92] png_0.1-8                 gower_1.0.1               snakecase_0.11.0          ashr_2.2-54               knitr_1.43                rstudioapi_0.15.0         tzdb_0.4.0               
 [99] reshape2_1.4.4            checkmate_2.2.0           nlme_3.1-162              nloptr_2.0.3              cachem_1.0.8              KernSmooth_2.23-22        RcppZiggurat_0.1.6       
[106] AnnotationDbi_1.62.2      foreign_0.8-84            pillar_1.9.0              grid_4.3.0                vctrs_0.6.3               gplots_3.1.3              mashr_0.2.69             
[113] xtable_1.8-4              cluster_2.1.4             htmlTable_2.4.1           Rgraphviz_2.44.0          KEGGgraph_1.60.0          evaluate_0.21             invgamma_1.1             
[120] mvtnorm_1.2-2             cli_3.6.1                 locfit_1.5-9.8            compiler_4.3.0            rlang_1.1.1               crayon_1.5.2              SQUAREM_2021.1           
[127] future.apply_1.11.0       plyr_1.8.8                stringi_1.7.12            babelgene_22.9            assertthat_0.2.1          Biostrings_2.68.1         lmerTest_3.1-3           
[134] munsell_0.5.0             aod_1.3.2                 Matrix_1.6-0              dir.expiry_1.8.0          hms_1.1.3                 sparseMatrixStats_1.12.2  bit64_4.0.5              
[141] future_1.33.0             KEGGREST_1.40.0           rbibutils_2.2.13          Rfast_2.0.8               memoise_2.0.1             broom_1.0.5               bit_4.0.5                
[148] EnrichmentBrowser_2.30.2 

Would appreciate your advice on this one. Thanks.

lazappi commented 1 year ago

Hi @ainefairbrother

I don't think I have seen this come up before. I think it might have something to do with what is in the object. Are you able to share the file (a small subset that reproduces the error would be even better)?

It would also be helpful if you can post the full output with verbose = TRUE. That would help work out which part of the object is causing the issue.

ainefairbrother commented 1 year ago

Hi @lazappi thanks for getting back to me. The full error message using verbose=TRUE is as follows:

ℹ Using the Python reader
ℹ Using anndata version 0.8.0
sh: 4: /home/MRAineFairbrotherBrowne/miniconda3/envs/R/etc/conda/deactivate.d/deactivate-r-base.sh: [[: not found
✔ Read ./.../.../adata_annot_dedup.h5ad [4m 38.7s]
! The passed object is a 'AnnDataR6' object, conversion is likely to be less reliable
ℹ uns is empty and was skipped
✔ X matrix converted to assay [24m 28.5s]
Error in py_ref_to_r(x) : negative length vectors are not allowed
In addition: Warning message:
The passed object is a 'AnnDataR6' object, conversion is likely to be less reliable
✖ Converting AnnData to SingleCellExperiment ... failed
ainefairbrother commented 1 year ago

I can't send any of the original file, but an update to this is that I believe it was originating from the AnnData.layers slot - it was somehow not writing out from Python (write_h5ad) or reading into R correctly (anndata::read_h5ad). Removing it solved the problem, allowing the conversion to succeed.

lazappi commented 1 year ago

Ok, great! If you weren't able to write it in Python either then I guess you somehow managed to get an incompatible object in the layers slot (I'm not sure how that could have happened though).

Marwansha commented 9 months ago

got same error while reading an concatenated h5ad object
reading one of the original h5ad objects worked fine so i guess its something with the way sc.concat work

ladata <- readH5AD("hvg_total_merged.h5ad")
/pasteur/appa/homes/masharaw/.cache/R/basilisk/1.4.0/zellkonverter/1.2.1/zellkonverterAnnDataEnv/lib/python3.7/site-packages/anndata/_core/anndata.py:1828: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")
Error in py_ref_to_r(x) : negative length vectors are not allowed
In addition: Warning message:
In .extract_or_skip_assay(skip_assays = skip_assays, hdf5_backed = hdf5_backed,  :
  'X' matrix does not support transposition and has been skipped

Reading 1 file not the merged

ladata <- readH5AD("adata1.h5ad")
Note: Using stored X_name value 'X'
> ladata
class: SingleCellExperiment 
dim: 36620 8306 
metadata(1): log1p
assays(6): X GF-G_normalized ... raw scran
rownames(36620): MIR1302-2HG FAM138A ... SARS_N SARS_ORF10
rowData names(10): gene_ids feature_types ... RP binomial_deviance
colnames(8306): AACCACATCCACCTGT CATACCCGTACCAATC ... TCCTAATAGTGTCATC
  TAGACTGAGGTCGTCC
colData names(17): sample_id Library ... size_factors condition
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
lazappi commented 9 months ago

@Marwansha It looks like you are using {zellkonverter} v1.2.1 but the latest release is v1.12.0. Can you please try the latest version? I think/hope this should work then.

GabrielHoffman commented 5 months ago

I'm still having this issue after trying multiple R reinstallation tries (source and conda) on my CentOS 7.9.2009 cluster. It's worked fine for months, but I must have updated something by mistake, and now I can't get it to work again.

> sce = readH5AD(file, use_hdf5=TRUE)

Warning message:
'X' matrix does not support transposition and has been skipped

This means I can't read the single cell counts into memory. This is a major step in my dreamlet package workflow.

> BiocManager::valid()
[1] TRUE

> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /sc/arion/work/hoffmg01/condaEnv/R43/lib/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
 [3] Biobase_2.62.0              GenomicRanges_1.54.1       
 [5] GenomeInfoDb_1.38.8         IRanges_2.36.0             
 [7] S4Vectors_0.40.2            BiocGenerics_0.48.1        
 [9] MatrixGenerics_1.14.0       matrixStats_1.2.0          
[11] zellkonverter_1.12.1       

loaded via a namespace (and not attached):
 [1] Matrix_1.6-5            jsonlite_1.8.8          BiocManager_1.30.22    
 [4] compiler_4.3.3          crayon_1.5.2            filelock_1.0.3         
 [7] Rcpp_1.0.12             rhdf5filters_1.14.1     bitops_1.0-7           
[10] parallel_4.3.3          png_0.1-8               reticulate_1.35.0      
[13] lattice_0.22-5          XVector_0.42.0          S4Arrays_1.2.1         
[16] DelayedArray_0.28.0     GenomeInfoDbData_1.2.11 rlang_1.1.3            
[19] HDF5Array_1.30.1        dir.expiry_1.10.0       SparseArray_1.2.4      
[22] cli_3.6.2               withr_3.0.0             Rhdf5lib_1.24.2        
[25] zlibbioc_1.48.2         grid_4.3.3              basilisk_1.14.3        
[28] rhdf5_2.46.1            abind_1.4-5             RCurl_1.98-1.14        
[31] basilisk.utils_1.14.1   tools_4.3.3     

Suggestions or fixes?

Best, Gabriel

lazappi commented 5 months ago

@GabrielHoffman Can you please open a new issue for this? This message can be caused by various things and I thought we could caught most of them but obviously not whatever the problem is here.