theislab / scib-pipeline

Snakemake pipeline that works with the scIB package to benchmark data integration methods.
MIT License
64 stars 28 forks source link

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType' #28

Closed stemangiola closed 2 years ago

stemangiola commented 2 years ago

I receive this error (below my snakemake config)

I don't know how to debug this, or understand what is going on.

Thanks a lot.

(scib-pipeline-R4) slurm-login02 279 % snakemake --configfile configs/test_data.yaml --cores 5
Building DAG of jobs...
The params used to generate one or several output files has changed:
    To inspect which output files have changes, run 'snakemake --list-params-changes'.
    To trigger a re-run, use 'snakemake -R $(snakemake --list-params-changes)'.
Using shell: /usr/bin/bash
Provided cores: 5
Rules claiming more threads will be scaled down.
Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
all                      1              1              1
convert_RDS_h5ad         5              1              1
embeddings               1              1              1
embeddings_single        6              1              1
metrics                  1              1              1
metrics_single           6              1              1
total                   20              1              1

Select jobs to execute...

[Tue Apr 19 23:28:11 2022]
Job 11:
        Convert integrated data from harmony into h5ad

[Tue Apr 19 23:28:11 2022]
Job 13:
        Convert integrated data from liger into h5ad

[Tue Apr 19 23:28:11 2022]
Job 8:
        Convert integrated data from fastmnn into h5ad

[Tue Apr 19 23:28:11 2022]
Job 15:
        Convert integrated data from seurat into h5ad

[Tue Apr 19 23:28:11 2022]
Job 17:
        Convert integrated data from seuratrpca into h5ad

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:19 2022]
Error in rule convert_RDS_h5ad:
    jobid: 13
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad
    shell:

        if [ liger == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:20 2022]
Error in rule convert_RDS_h5ad:
    jobid: 17
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad
    shell:

        if [ seuratrpca == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:22 2022]
Error in rule convert_RDS_h5ad:
    jobid: 11
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad
    shell:

        if [ harmony == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:22 2022]
Error in rule convert_RDS_h5ad:
    jobid: 15
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad
    shell:

        if [ seurat == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:23 2022]
Error in rule convert_RDS_h5ad:
    jobid: 8
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad
    shell:

        if [ fastmnn == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
The params used to generate one or several output files has changed:
    To inspect which output files have changes, run 'snakemake --list-params-changes'.
    To trigger a re-run, use 'snakemake -R $(snakemake --list-params-changes)'.
Complete log: .snakemake/log/2022-04-19T232810.877193.snakemake.log

snakemake config

ROOT: data
r_env : scib-R4
py_env : scib-pipeline-R4

timing: false
unintegrated_metrics: false

FEATURE_SELECTION:
  #hvg: 2000
  full_feature: 0

SCALING:
  - unscaled
  #- scaled

METHODS:
# python methods
  bbknn:
    output_type: knn
  combat:
    output_type: full
  #desc:
  #  output_type: embed
#  mnn:
#    output_type: full
  #saucie:
  #  output_type:
  #    - full
  #    - embed
  scanorama:
    output_type:
      - embed
      - full
  scanvi:
    output_type: embed
    no_scale: true
    use_celltype: true
  scgen:
    output_type: full
    use_celltype: true
  scvi:
    no_scale: true
    output_type: embed
  #trvae:
  #  no_scale: true
  # output_type:
  #    - embed
  #    - full
  # trvaep:
  #   no_scale: true
  #   output_type:
  #     - embed
  #     - full
# R methods
  #conos: # temporary directory issue
  #  R: true
  #  output_type: knn
  fastmnn:
    R: true
    output_type:
      - embed
      - full
  harmony:
    R: true
    output_type: embed
  liger:
    no_scale: true
    R: true
    output_type: embed
  seurat:
    R: true
    output_type: full
  seuratrpca:
      R: true
      output_type: full

DATA_SCENARIOS:
  test_data:
    batch_key: batch
    label_key: celltype
    organism: mouse
    assay: expression
    file: data/adata_norm.h5ad
stemangiola commented 2 years ago

I think I uderstood that some R package are missing (although the error does not say)

The problem is that conda is using R 4.0.5 rather than 4.1.0

(scib-pipeline-R4) slurm-login02 288 % /home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/R/bin/R

R version 4.0.5 (2021-03-31) -- "Shake and Throw"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-conda-linux-gnu (64-bit)

How can I create the environment pointing to more recent versions of R?

Thanks

mumichae commented 2 years ago

Hi @stemangiola,

Could you try pinning the rpy2 version to rpy2=3.4.2 in the envs/scib-pipeline-R4.yml? Then update the environment via

mamba env update -f envs/scib-pipeline-R4.yml

# or if mamba not installed:
# conda env update -f envs/scib-pipeline-R4.yml

That should fix the issue when using R 4.0

stemangiola commented 2 years ago

Thanks,

Thanks

mumichae commented 2 years ago

Technically everything should be able to fit into a single environment, but we initially had some dependency clashes so we needed the different environments.

rpy2 is only needed for the pipeline environment, and yes you'll need to add it.

stemangiola commented 2 years ago

Thanks seems to have worked!

I had to manually install install.package("SeuratObject") at

.conda/envs/scib-pipeline-R4/lib/R/bin/R

otherwise I get error in the snakemake with package not found. How can automate this in the conda config?

mumichae commented 2 years ago

Hm, that is interesting. For me, the SeuratObject is already installed using conda. Could you double-check to make sure you are working with the correct libraries?

conda activate scib-pipeline-R4
Rscript -e '.libPaths()'
conda activate scib-R4
Rscript -e '.libPaths()'

The output should only be the conda library path

stemangiola commented 2 years ago

Hello,

here the output

(base) slurm-login02 253 % conda activate scib-pipeline-R4
(scib-pipeline-R4) slurm-login02 254 % Rscript -e '.libPaths()'
[1] "/stornext/Home/data/allstaff/m/mangiola.s/.conda/envs/scib-pipeline-R4/lib/R/library"
(scib-pipeline-R4) slurm-login02 255 % conda activate scib-R4
(scib-R4) slurm-login02 256 % Rscript -e '.libPaths()'
[1] "/stornext/Home/data/allstaff/m/mangiola.s/.conda/envs/scib-R4/lib/R/library"
mumichae commented 2 years ago

Ok, seems correct. I'll add the SeuratObject dependency to the conda yaml file to be safe.