This raises a RuntimeError: Failed to generate metadata for RenameAxis(frame=Merge(75f6fd3), index=None). This operation may not be supported by the current backend. (full stacktrace + debug checkpoint screenshot at dask-expr/_collections.py below)
Steps/Code to reproduce bug
We ran into this in crossfit when running our pytests. Here is a repro that would two methods from crossfit library (namely sample_raw and reset_global_index). I imagine a simpler reproduce is possible, but in a timeboxed manner this is what I was able to get
import dask_cudf
from crossfit.dataset.beir.raw import sample_raw
from crossfit.dataset.beir.load import reset_global_index
import os
dataset_name = "nq"
out_dir = None
blocksize = 2**30
raw_path = sample_raw(dataset_name, out_dir=out_dir, overwrite=False)
qrels_files = [
f for f in os.listdir(os.path.join(raw_path, "qrels")) if f.endswith(".tsv")
]
qrels_file = qrels_files[0]
qrels_dtypes = {"query-id": "str", "corpus-id": "str", "score": "int32"}
queries_ddf = dask_cudf.read_json(
os.path.join(raw_path, "queries.jsonl"),
lines=True,
blocksize=blocksize,
dtype={"_id": "string", "text": "string"},
)[["_id", "text"]]
# if we don't call reset_global_index code works fine
queries_ddf = reset_global_index(queries_ddf)
qrels_ddf = dask_cudf.read_csv(
os.path.join(raw_path, "qrels", qrels_file),
sep="\t",
dtype=qrels_dtypes,
)[["query-id", "corpus-id", "score"]]
qrels_ddf.merge(
queries_ddf,
left_on="query-id",
right_on="_id",
how="left",
)
print("Success")
Expected behavior
Before 24.10 nightly the merge worked as expected
Installed crossfit using pip (i.e pip installed cudf etc)
Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details
Additional context
Traceback (most recent call last):
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/utils/utils.py", line 228, in __getattr__
return self[key]
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/utils/performance_tracking.py", line 51, in wrapper
return func(*args, **kwargs)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/core/dataframe.py", line 1347, in __getitem__
out = self._get_columns_by_label(arg)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/utils/performance_tracking.py", line 51, in wrapper
return func(*args, **kwargs)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/core/frame.py", line 358, in _get_columns_by_label
return self._from_data_like_self(self._data.select_by_label(labels))
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/core/column_accessor.py", line 401, in select_by_label
return self._select_by_label_grouped(key)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/core/column_accessor.py", line 563, in _select_by_label_grouped
result = self._grouped_data[key]
KeyError: 'rename_axis'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/dask_expr/_core.py", line 470, in __getattr__
return object.__getattribute__(self, key)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/dask_expr/_expr.py", line 496, in _meta
return self.operation(*args, **self._kwargs)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/dask/utils.py", line 1241, in __call__
return getattr(__obj, self.method)(*args, **kwargs)
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/cudf/utils/utils.py", line 230, in __getattr__
raise AttributeError(
AttributeError: DataFrame object has no attribute rename_axis
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/dask_expr/_collection.py", line 4799, in new_collection
meta = expr._meta
File "/datasets/praateekm/env_setup/micromamba/envs/crossfit_2410/lib/python3.10/site-packages/dask_expr/_core.py", line 475, in __getattr__
raise RuntimeError(
RuntimeError: Failed to generate metadata for RenameAxis(frame=Merge(75f6fd3), index=None). This operation may not be supported by the current backend.
Describe the bug
When performing a merge and
left._meta.index_name != right._meta.index_name
the behavior in dask-expr has changed https://github.com/dask/dask-expr/pull/1121/filesThis raises a
RuntimeError: Failed to generate metadata for RenameAxis(frame=Merge(75f6fd3), index=None). This operation may not be supported by the current backend.
(full stacktrace + debug checkpoint screenshot at dask-expr/_collections.py below)Steps/Code to reproduce bug We ran into this in crossfit when running our pytests. Here is a repro that would two methods from crossfit library (namely
sample_raw
andreset_global_index
). I imagine a simpler reproduce is possible, but in a timeboxed manner this is what I was able to getExpected behavior Before 24.10 nightly the merge worked as expected
Installed
crossfit
usingpip
(i.e pip installed cudf etc)Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsAdditional context