openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 68 forks source link

Rust kernel panic on download #279

Closed jordan-schneider closed 1 year ago

jordan-schneider commented 1 year ago

By invoking pyensembl install --release 108 --species human I get a kernel crash in polars, the rust dataframe library that is begin used under the hood. I don't know if the problem is in pyensembl, gtfparse, or polars, but none of them have any issues for this.

python 3.10 in conda pyensembl 2.2.3 polars 0.16.1 gtfparse 2.0.1

The error messages are

Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance.thread '<unnamed>' panicked at 'python function failed ValueError: Invalid strand: 1', src/apply/series.rs:222:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
Traceback (most recent call last):
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/polars/internals/expr/expr.py", line 3138, in wrap_f
    return x.apply(f, return_dtype=return_dtype, skip_nulls=skip_nulls)
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/polars/internals/series/series.py", line 3495, in apply
    return wrap_s(self._s.apply_lambda(func, pl_return_dtype, skip_nulls))
pyo3_runtime.PanicException: python function failed ValueError: Invalid strand: 1
Traceback (most recent call last):
  File "/home/$USER/miniconda3/envs/$PROJECT/bin/pyensembl", line 8, in <module>
    sys.exit(run())
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/pyensembl/shell.py", line 256, in run
    genome.index(overwrite=args.overwrite)
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/pyensembl/genome.py", line 273, in index
    self.db.connect_or_create(overwrite=overwrite)
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/pyensembl/database.py", line 290, in connect_or_create
    return self.create(overwrite=overwrite)
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/pyensembl/database.py", line 212, in create
    df = self._load_gtf_as_dataframe(
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/pyensembl/database.py", line 604, in _load_gtf_as_dataframe
    df = read_gtf(
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 261, in read_gtf
    result_df = result_df.with_columns(
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/polars/internals/dataframe/frame.py", line 5733, in with_columns
    self.lazy().with_columns(exprs, **named_exprs).collect(no_optimization=True)
  File "/home/$USER/miniconda3/envs/$PROJECT/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py", line 1143, in collect
    return pli.wrap_df(ldf.collect())
pyo3_runtime.PanicException: Unwrapped panic from Python code

I'm going to re-run with RUST_BACKTRACE=1 but the download takes a very long time.

jordan-schneider commented 1 year ago

Downgrading gtfparse to 1.3.0 fixes the issue.

afishman commented 1 year ago

I was having the same issue and downgrading to 1.3.0 fixed for me as well

iskandr commented 1 year ago

Thanks! This error message is definitely sub-optimal, I think the key is here:

python function failed ValueError: Invalid strand: 1

Looking to reproduce.

iskandr commented 1 year ago

Well, I'm not able to reproduce this, maybe you can give me some insight @jordan-schneider and @afishman about your setups. What OS are you on?

iskandr commented 1 year ago

Also, tracking the exception down to pyensembl (and not gtfparse):

pyensembl/normalization.py:    raise ValueError("Invalid strand: %s" % (strand,))
jordan-schneider commented 1 year ago

I'm on Ubuntu 22.04.1 LTS

iskandr commented 1 year ago

Here's a guess: polars (which I just switched to) might give me a string column for strands whereas Pandas used to convert to integers and the PyEnsembl normalization helper doesn't know what to do with e.g. "1" and "-1" as values.

I just pushed an updated PyEnsembl 2.2.5 which should handle this case.

I'm not sure though why this would happen inconsistently across systems.

afishman commented 1 year ago

Cool, I've just managed to reproduce the error by upgrading gtfparse back to 2.0.1 Will give your fix a test

I'm also using conda, on CentOS (unfortunately)

afishman commented 1 year ago

Well it's working on your test branch for me. Nice one. I was on pyensembl 2.2.3 / gtfparse 2.0.1 / polars 0.15.18 when I encountered the error

iskandr commented 1 year ago

Great, the fixed version 2.2.5 is live on PyPI.

CherWeiYuan commented 6 months ago

Hi

I am encountering the same problem now from the conda install of pyensembl:

pyensembl install --release 108 --species homo_sapiens

Command line:

2023-12-21 17:05:43,759 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=108, species='homo_sapiens') 2023-12-21 17:05:43,759 - pyensembl.database - INFO - Creating database: /home/weiyuan/.cache/pyensembl/GRCh38/ensembl108/Homo_sapiens.GRCh38.108.gtf.db 2023-12-21 17:05:43,759 - pyensembl.database - INFO - Reading GTF from /home/weiyuan/.cache/pyensembl/GRCh38/ensembl108/Homo_sapiens.GRCh38.108.gtf.gz Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance. thread '' panicked at 'python function failed ValueError: Invalid strand: 1', src/apply/series.rs:219:19 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace --- PyO3 is resuming a panic after fetching a PanicException from Python. --- Python stack trace below: Traceback (most recent call last): File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/expr/expr.py", line 3303, in wrap_f return x.apply( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/utils/decorators.py", line 37, in wrapper return function(*args, *kwargs) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/utils/decorators.py", line 136, in wrapper return function(args, **kwargs) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/series/series.py", line 4029, in apply self._s.apply_lambda(function, pl_return_dtype, skip_nulls) pyo3_runtime.PanicException: python function failed ValueError: Invalid strand: 1 Traceback (most recent call last): File "/home/weiyuan/Downloads/yes/envs/delta/bin/pyensembl", line 10, in sys.exit(run()) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/shell.py", line 256, in run genome.index(overwrite=args.overwrite) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/genome.py", line 275, in index self.db.connect_or_create(overwrite=overwrite) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/database.py", line 291, in connect_or_create return self.create(overwrite=overwrite) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/database.py", line 213, in create df = self._load_gtf_as_dataframe( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/database.py", line 605, in _load_gtf_as_dataframe df = read_gtf( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 261, in read_gtf result_df = result_df.with_columns( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/dataframe/frame.py", line 6798, in with_columns self.lazy() File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1475, in collect return wrap_df(ldf.collect()) pyo3_runtime.PanicException: Unwrapped panic from Python code

It seems that the conda install option is broken. May I seek assistance? Thank you.

Best, WY