Open Azaya89 opened 3 months ago
New issue: it appears the
.parq
file in thenyc_taxi
examples is no longer being read correctly byfastparquet
.
This is a problem I also got in https://github.com/holoviz-topics/examples/pull/369. The last comment was:
Ok so I ended up with keeping
pyarrow
as the engine but adding this before the imports:import dask dask.config.set({"dataframe.convert-string": False}) dask.config.set({"dataframe.query-planning": False})
Since HoloViews does that too in its test suite, meaning that there's isn't yet "official" support for these two features (query planner and pyarrow string): https://github.com/holoviz/holoviews/blob/6b0121d5a3685989fca58a1687961523a5fd575c/holoviews/tests/conftest.py#L61-L62
However, since then, HoloViews no longer sets dask.config.set({"dataframe.query-planning": False})
(it still has dask.config.set({"dataframe.convert-string": False})
).
My suggestions:
engine='pyarrow'
and see whether the notebook runs fine. Don't set any of the dask.config
options yet, maybe it works without them.dask.config.set({"dataframe.convert-string": False})
.Note that in the past, pyarrow and fastparquet had very different performance from each other in certain workloads, so ideally you'd at least qualitatively compare the old pinned version with the new version, and make sure that performance has not significantly degraded.
- Try with
engine='pyarrow'
and see whether the notebook runs fine. Don't set any of thedask.config
options yet, maybe it works without them.- If it doesn't work, start with
dask.config.set({"dataframe.convert-string": False})
.
I have tried each of the suggestions individually and all together but it still didn't work. It still shows the same Traceback error. Here's the full Traceback:
```python
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
File
OK thanks for the report. It looks like the file cannot be read with pyarrow. We'll have to read it with fastparquet (for that dask-expr will have to be disabled), and save it again using pyarrow.
OK thanks for the report. It looks like the file cannot be read with pyarrow. We'll have to read it with fastparquet (for that dask-expr will have to be disabled), and save it again using pyarrow.
Can you guide me on how I can do this?
Suggesting Needed to avoid a warning emitted when datashader internally imports dask.dataframe import.
OK. I'll make it clearer.
This PR updates some of the dependencies in the
nyc_taxi
andglaciers
examples.