I'm trying to download the Ensembl 107 release of Mouse:
$ pyensembl install --release 107 --species mus_musculus
2023-09-19 16:48:07,049 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=107, species='mus_musculus')
2023-09-19 16:48:07,050 - pyensembl.database - INFO - Creating database: ~/.cache/pyensembl/GRCm39/ensembl107/Mus_musculus.GRCm39.107.gtf.db
2023-09-19 16:48:07,050 - pyensembl.database - INFO - Reading GTF from ~/.cache/pyensembl/GRCm39/ensembl107/Mus_musculus.GRCm39.107.gtf.gz
Traceback (most recent call last):
File " ~/envs/pyensembl/bin/pyensembl", line 10, in <module>
sys.exit(run())
File " ~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/shell.py", line 272, in run
genome.index(overwrite=args.overwrite)
File " ~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/genome.py", line 280, in index
self.db.connect_or_create(overwrite=overwrite)
File " ~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/database.py", line 284, in connect_or_create
return self.create(overwrite=overwrite)
File " ~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/database.py", line 206, in create
df = self._load_gtf_as_dataframe(
File " ~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/database.py", line 611, in _load_gtf_as_dataframe
df = read_gtf(
File " ~/envs/pyensembl/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 254, in read_gtf
result_df = parse_gtf_and_expand_attributes(
File " ~/envs/pyensembl/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 189, in parse_gtf_and_expand_attributes
df = parse_gtf(
File ~/envs/pyensembl/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 155, in parse_gtf
df_lazy = parse_with_polars_lazy(
File ~/envs/pyensembl/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 87, in parse_with_polars_lazy
polars.toggle_string_cache(True)
AttributeError: module 'polars' has no attribute 'toggle_string_cache'. Did you mean: 'enable_string_cache'?
This can be fixed when downgrading polars to 0.16.8. But when rerunning the command, then this leads to another error:
$ pyensembl install --release 107 --species mus_musculus
2023-09-19 16:52:20,886 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=107, species='mus_musculus')
2023-09-19 16:52:20,887 - pyensembl.database - INFO - Creating database: ~/.cache/pyensembl/GRCm39/ensembl107/Mus_musculus.GRCm39.107.gtf.db
2023-09-19 16:52:20,887 - pyensembl.database - INFO - Reading GTF from ~/.cache/pyensembl/GRCm39/ensembl107/Mus_musculus.GRCm39.107.gtf.gz
Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance.thread '<unnamed>' panicked at 'python function failed ValueError: Invalid strand: 2', src/apply/series.rs:219:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
Traceback (most recent call last):
File "~/envs/pyensembl/lib/python3.10/site-packages/polars/internals/expr/expr.py", line 3277, in wrap_f
return x.apply(
File "~/envs/pyensembl/lib/python3.10/site-packages/polars/utils.py", line 433, in wrapper
return function(*args, **kwargs)
File "~/envs/pyensembl/lib/python3.10/site-packages/polars/utils.py", line 498, in wrapper
return function(*args, **kwargs)
File "~/envs/pyensembl/lib/python3.10/site-packages/polars/internals/series/series.py", line 3736, in apply
return wrap_s(self._s.apply_lambda(function, pl_return_dtype, skip_nulls))
pyo3_runtime.PanicException: python function failed ValueError: Invalid strand: 2
Traceback (most recent call last):
File "~/envs/pyensembl/bin/pyensembl", line 10, in <module>
sys.exit(run())
File "~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/shell.py", line 272, in run
genome.index(overwrite=args.overwrite)
File "~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/genome.py", line 280, in index
self.db.connect_or_create(overwrite=overwrite)
File "~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/database.py", line 284, in connect_or_create
return self.create(overwrite=overwrite)
File "~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/database.py", line 206, in create
df = self._load_gtf_as_dataframe(
File "~/envs/pyensembl/lib/python3.10/site-packages/pyensembl/database.py", line 611, in _load_gtf_as_dataframe
df = read_gtf(
File ~/envs/pyensembl/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 261, in read_gtf
result_df = result_df.with_columns(
File "~/envs/pyensembl/lib/python3.10/site-packages/polars/internals/dataframe/frame.py", line 6122, in with_columns
self.lazy()
File "~/envs/pyensembl/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py", line 1160, in collect
return pli.wrap_df(ldf.collect())
pyo3_runtime.PanicException: Unwrapped panic from Python code
I have narrowed down this issue to the following line in the code:
Hello,
I'm trying to download the Ensembl 107 release of Mouse:
It seems that from
polars==0.17.0
there is a breaking change regarding this attribute. https://github.com/pola-rs/polars/releases/tag/py-0.17.0This can be fixed when downgrading
polars
to0.16.8
. But when rerunning the command, then this leads to another error:I have narrowed down this issue to the following line in the code:
https://github.com/openvax/pyensembl/blob/d178ac926b7329b9f9d81574ecf2b17e554516c5/pyensembl/database.py#L615
By commenting this line, the issue get resolved but I'm not sure what exactly in
normalize_strand
is causing the issue.