Closed jordan-schneider closed 1 year ago
Downgrading gtfparse to 1.3.0 fixes the issue.
I was having the same issue and downgrading to 1.3.0 fixed for me as well
Thanks! This error message is definitely sub-optimal, I think the key is here:
python function failed ValueError: Invalid strand: 1
Looking to reproduce.
Well, I'm not able to reproduce this, maybe you can give me some insight @jordan-schneider and @afishman about your setups. What OS are you on?
Also, tracking the exception down to pyensembl (and not gtfparse):
pyensembl/normalization.py: raise ValueError("Invalid strand: %s" % (strand,))
I'm on Ubuntu 22.04.1 LTS
Here's a guess: polars (which I just switched to) might give me a string column for strands whereas Pandas used to convert to integers and the PyEnsembl normalization helper doesn't know what to do with e.g. "1" and "-1" as values.
I just pushed an updated PyEnsembl 2.2.5 which should handle this case.
I'm not sure though why this would happen inconsistently across systems.
Cool, I've just managed to reproduce the error by upgrading gtfparse back to 2.0.1 Will give your fix a test
I'm also using conda, on CentOS (unfortunately)
Well it's working on your test branch for me. Nice one. I was on pyensembl 2.2.3
/ gtfparse 2.0.1
/ polars 0.15.18
when I encountered the error
Great, the fixed version 2.2.5 is live on PyPI.
Hi
I am encountering the same problem now from the conda install of pyensembl:
pyensembl install --release 108 --species homo_sapiens
Command line:
2023-12-21 17:05:43,759 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=108, species='homo_sapiens') 2023-12-21 17:05:43,759 - pyensembl.database - INFO - Creating database: /home/weiyuan/.cache/pyensembl/GRCh38/ensembl108/Homo_sapiens.GRCh38.108.gtf.db 2023-12-21 17:05:43,759 - pyensembl.database - INFO - Reading GTF from /home/weiyuan/.cache/pyensembl/GRCh38/ensembl108/Homo_sapiens.GRCh38.108.gtf.gz Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance. thread '
' panicked at 'python function failed ValueError: Invalid strand: 1', src/apply/series.rs:219:19 note: run with RUST_BACKTRACE=1
environment variable to display a backtrace --- PyO3 is resuming a panic after fetching a PanicException from Python. --- Python stack trace below: Traceback (most recent call last): File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/expr/expr.py", line 3303, in wrap_f return x.apply( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/utils/decorators.py", line 37, in wrapper return function(*args, *kwargs) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/utils/decorators.py", line 136, in wrapper return function(args, **kwargs) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/series/series.py", line 4029, in apply self._s.apply_lambda(function, pl_return_dtype, skip_nulls) pyo3_runtime.PanicException: python function failed ValueError: Invalid strand: 1 Traceback (most recent call last): File "/home/weiyuan/Downloads/yes/envs/delta/bin/pyensembl", line 10, insys.exit(run()) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/shell.py", line 256, in run genome.index(overwrite=args.overwrite) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/genome.py", line 275, in index self.db.connect_or_create(overwrite=overwrite) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/database.py", line 291, in connect_or_create return self.create(overwrite=overwrite) File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/database.py", line 213, in create df = self._load_gtf_as_dataframe( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/pyensembl/database.py", line 605, in _load_gtf_as_dataframe df = read_gtf( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 261, in read_gtf result_df = result_df.with_columns( File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/dataframe/frame.py", line 6798, in with_columns self.lazy() File "/home/weiyuan/Downloads/yes/envs/delta/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1475, in collect return wrap_df(ldf.collect()) pyo3_runtime.PanicException: Unwrapped panic from Python code
It seems that the conda install option is broken. May I seek assistance? Thank you.
Best, WY
By invoking
pyensembl install --release 108 --species human
I get a kernel crash in polars, the rust dataframe library that is begin used under the hood. I don't know if the problem is in pyensembl, gtfparse, or polars, but none of them have any issues for this.python 3.10 in conda pyensembl 2.2.3 polars 0.16.1 gtfparse 2.0.1
The error messages are
I'm going to re-run with RUST_BACKTRACE=1 but the download takes a very long time.