theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
144 stars 27 forks source link

udunits2-deactivate.sh: [[: not found #84

Closed royfrancis closed 1 year ago

royfrancis commented 1 year ago

As an example, this small dataset (1496 cells, 43MB) from cellxgene.

The python reader just hangs.

library(zellkonverter)
g  <- readH5AD("local.h5ad", verbose = TRUE)
ℹ Using the Python reader
ℹ Using anndata version 0.8.0
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/udunits2-deactivate.sh: [[: not found
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/geotiff-deactivate.sh: [[: not found
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/gdal-deactivate.sh: [[: not found
sh: 11: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/gdal-deactivate.sh: [[: not found
sh: 4: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/deactivate-r-base.sh: [[: not found
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/deactivate-gxx_linux-64.sh: Syntax error: "(" unexpected

Warning message:
In system(paste(act.cmd, collapse = " "), intern = TRUE) :
  running command '. '/home/roy/.cache/R/basilisk/1.4.0/0/etc/profile.d/conda.sh' && conda activate && /home/roy/miniconda3/envs/r-4.1/lib/R/bin/Rscript --no-save --no-restore --no-site-file --no-init-file --default-packages=NULL -e "con <- socketConnection(port=11022, open='wb', blocking=TRUE);serialize(Sys.getenv(), con);close(con)"' had status 2

I do have an existing conda env and in fact R/RStudio is run inside a conda env (r-4.1). Running on Ubuntu 22.04, conda 4.14.0, R 4.1.0.

lazappi commented 1 year ago

I'm not entirely sure what's happening but I think it might have to do with trying to create and activate a conda environment from inside another conda environment. I haven't tested doing that with {zellkonverter} and I can see how it might lead to issues.

If you are comfortable using conda I would suggest creating an environment that has R and {zellkonverter} as well as Python anndata. Then you could use the AnnData2SCE() function directly in your main environment rather than the special environment {zellkonverter} creates for you. Something like:

anndata <- reticulate::import("anndata")
adata <- anndata$read_h5ad("path/to/file.h5ad")
sce <- zellkonverter::AnnData2SCE(adata)
LTLA commented 1 year ago

FWIW:

Those wacky [[: not found errors are occurring because basilisk attempts to activate zellkonverter's internal conda environment so that it runs properly. In doing so, conda will automatically deactivate the current environment, which is why you can see the warnings referencing a path to a non-basilisk environment (presumably the user-defined "enclosing" environment in which R is being run). The deactivation scripts for that enclosing environment seem to assume bash, but are being executed via sh, and I don't know why. Might be worth seeing what happens if you try to activate the zellkonverter environment on the command line whilst inside the enclosing environment.

It probably doesn't help that the enclosing environment was created with a conda installation that is not the same as the conda installation that basilisk is using. If this was the issue, you could force basilisk to use the same conda installation with the BASILISK_EXTERNAL_CONDA variable.

On hotel wifi right now, but if someone can diagnose the issue, maybe there's something that basilisk can do, e.g., auto-detect if it is already inside a conda environment and use the existing installation.

royfrancis commented 1 year ago

@lazappi Created a new conda env with R and zellkonverter, launched R and I get this error.

library(reticulate)
anndata <- reticulate::import("anndata")
adata <- anndata$read_h5ad("local.h5ad")

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  OSError: Unable to open file (file signature not found)

The file exists:

list.files()
[1] "local.h5ad" 

os <- reticulate::import("os")
os$listdir()
[1] "local.h5ad" 
royfrancis commented 1 year ago

Downloaded the h5ad file again as suggested here and this time it worked. I wonder if the h5ad file can be permanently damaged if not closed properly. And perhaps other issues are related to this. Is it necessary to close the file after anndata$read_h5ad("local.h5ad")?

lazappi commented 1 year ago

Usually, that's not an issue and even when using Python anndata directly there isn't explicit closing of the file. It does seem like your file was corrupted somehow though and I suppose it could have happened if there was an issue reading it at some stage.

royfrancis commented 1 year ago

Now when I tried on my larger dataset, I get this:

Registered S3 method overwritten by 'zellkonverter':
  method                from      
  py_to_r.numpy.ndarray reticulate
Warning message:
'X' matrix does not support transposition and has been skipped

And the output is a very small file. Similar to #42.

I am using bioconductor-zellkonverter 1.8.0 in the conda env.

lazappi commented 1 year ago

Would you be able to share the file so I can test it?

royfrancis commented 1 year ago

This discussion is continued on #42 since it's similar error. The dataset link and a docker container to reproduce the error is mentioned there.

royfrancis commented 1 year ago

zellkonverter did not work for my large dataset, so I ended up doing it manually.

First export counts and metadata from h5ad to csv in python.

import scanpy as sc
import numpy as np
import pandas as pd

print("Reading data...")
adata = sc.read_h5ad("file.h5ad")

print("Writing data...")
t=adata.raw.X.toarray()
pd.DataFrame(data=t, index=adata.obs_names, columns=adata.raw.var_names).to_csv('raw-counts.csv')
# comma separated csv with header, no rownames
pd.DataFrame(adata.obs).to_csv("metadata.csv")

Then read into R and create Seurat object.

library(Seurat)

message("Reading counts...")
x <- read.csv("raw-counts.csv",header=TRUE)
rownames(x) <- x[,1]
x[,1] <- NULL
print(dim(x))
print(x[1:5,1:5])

message("Reading metadata...")
m <- read.csv("metadata.csv",header=TRUE)
rownames(m) <- m[,1]
colnames(m)[1] <- "sample"
print(dim(m))
print(head(m))

message("Writing seurat object...")
saveRDS(
  CreateSeuratObject(counts=t(x),meta.data=m,project="seurat",min.cells=0,min.features=0),
  "seurat.Rds"
)

The 11GB h5ad file needed about 200GB RAM especially in the R part. Maybe there is a better way to do this with lower RAM requirements.