In this PR, a new requirement was imposed on the fn callable given as input to GroupedData.map_groups(): that it have a __name__ attribute. Unfortunately, callables partially parametrized using functools.partial() have no such attribute, so passing them into .map_groups() raises an error: AttributeError: 'functools.partial' object has no attribute '__name__'. This did not happen prior to the linked PR.
It's not a huge deal, but it did cause code to break unexpectedly, and I guess technically is in conflict with the type annotations on this method.
In [1]: import functools
In [2]: callable(functools.partial(lambda x: x))
Out[2]: True
In [3]: functools.partial(lambda x: x).__name__
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 functools.partial(lambda x: x).__name__
AttributeError: 'functools.partial' object has no attribute '__name__'
Versions / Dependencies
ray >= 2.21
PY3.10
macOS 14.4
Reproduction script
>>> import functools
>>> import ray
>>> ds = ray.data.range(10)
>>> ds.groupby("id").map_groups(lambda x: x) # this is fine
MapBatches(<lambda>)
+- Sort
+- Dataset(num_rows=10, schema={id: int64})
>>> ds.groupby("id").map_groups(functools.partial(lambda x: x)) # this errors
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 ds.groupby("id").map_groups(functools.partial(lambda x: x))
File ~/.pyenv/versions/3.10.13/envs/ev-detection-py310/lib/python3.10/site-packages/ray/data/grouped_data.py:253, in GroupedData.map_groups(self, fn, compute, batch_format, fn_args, fn_kwargs, fn_constructor_args, fn_constructor_kwargs, num_cpus, num_gpus, concurrency, **ray_remote_args)
249 yield from apply_udf_to_groups(fn, batch, *args, **kwargs)
251 # Change the name of the wrapped function so that users see the name of their
252 # function rather than `wrapped_fn` in the progress bar.
--> 253 wrapped_fn.__name__ = fn.__name__
255 # Note we set batch_size=None here, so it will use the entire block as a batch,
256 # which ensures that each group will be contained within a batch in entirety.
257 return sorted_ds._map_batches_without_batch_size_validation(
258 wrapped_fn,
259 batch_size=None,
(...)
271 **ray_remote_args,
272 )
AttributeError: 'functools.partial' object has no attribute '__name__'
What happened + What you expected to happen
In this PR, a new requirement was imposed on the
fn
callable given as input toGroupedData.map_groups()
: that it have a__name__
attribute. Unfortunately, callables partially parametrized usingfunctools.partial()
have no such attribute, so passing them into.map_groups()
raises an error:AttributeError: 'functools.partial' object has no attribute '__name__'
. This did not happen prior to the linked PR.It's not a huge deal, but it did cause code to break unexpectedly, and I guess technically is in conflict with the type annotations on this method.
Versions / Dependencies
ray >= 2.21
PY3.10 macOS 14.4Reproduction script
Issue Severity
Low: It annoys or frustrates me.