ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.6k stars 5.71k forks source link

[Data] `GroupedData.map_groups()` doesn't allow partial callables #46185

Open bdewilde opened 4 months ago

bdewilde commented 4 months ago

What happened + What you expected to happen

In this PR, a new requirement was imposed on the fn callable given as input to GroupedData.map_groups(): that it have a __name__ attribute. Unfortunately, callables partially parametrized using functools.partial() have no such attribute, so passing them into .map_groups() raises an error: AttributeError: 'functools.partial' object has no attribute '__name__'. This did not happen prior to the linked PR.

It's not a huge deal, but it did cause code to break unexpectedly, and I guess technically is in conflict with the type annotations on this method.

In [1]: import functools

In [2]: callable(functools.partial(lambda x: x))
Out[2]: True

In [3]: functools.partial(lambda x: x).__name__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 functools.partial(lambda x: x).__name__

AttributeError: 'functools.partial' object has no attribute '__name__'

Versions / Dependencies

ray >= 2.21 PY3.10 macOS 14.4

Reproduction script

>>> import functools
>>> import ray
>>> ds = ray.data.range(10)
>>> ds.groupby("id").map_groups(lambda x: x)  # this is fine
MapBatches(<lambda>)
+- Sort
   +- Dataset(num_rows=10, schema={id: int64})
>>> ds.groupby("id").map_groups(functools.partial(lambda x: x))  # this errors
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 ds.groupby("id").map_groups(functools.partial(lambda x: x))

File ~/.pyenv/versions/3.10.13/envs/ev-detection-py310/lib/python3.10/site-packages/ray/data/grouped_data.py:253, in GroupedData.map_groups(self, fn, compute, batch_format, fn_args, fn_kwargs, fn_constructor_args, fn_constructor_kwargs, num_cpus, num_gpus, concurrency, **ray_remote_args)
    249         yield from apply_udf_to_groups(fn, batch, *args, **kwargs)
    251 # Change the name of the wrapped function so that users see the name of their
    252 # function rather than `wrapped_fn` in the progress bar.
--> 253 wrapped_fn.__name__ = fn.__name__
    255 # Note we set batch_size=None here, so it will use the entire block as a batch,
    256 # which ensures that each group will be contained within a batch in entirety.
    257 return sorted_ds._map_batches_without_batch_size_validation(
    258     wrapped_fn,
    259     batch_size=None,
   (...)
    271     **ray_remote_args,
    272 )

AttributeError: 'functools.partial' object has no attribute '__name__'

Issue Severity

Low: It annoys or frustrates me.

LeoLiao123 commented 3 months ago

Hi, can I take on this issue?