mars-project / mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
https://mars-project.readthedocs.io
Apache License 2.0
2.7k stars 325 forks source link

[BUG] md.read_ray_dataset is not compatible with latest Ray #3317

Closed fyrestone closed 1 year ago

fyrestone commented 1 year ago

Describe the bug

import mars
mars.new_session(backend='ray')

import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(
    mt.random.rand(1000_0000, 4),
    columns=list('abcd'))
# Convert mars dataframe to ray dataset
ds = md.to_ray_dataset(df)
print(ds.schema(), ds.count())
ds.filter(lambda row: row["a"] > 0.5).show(5)
# Convert ray dataset to mars dataframe
df2 = md.read_ray_dataset(ds)
print(df2.head(5).execute())

output

Traceback (most recent call last):
  File "/home/admin/Work/mars/mars/dataframe/datasource/read_raydataset.py", line 120, in read_ray_dataset
    from ray.data.impl.pandas_block import PandasBlockSchema
ModuleNotFoundError: No module named 'ray.data.impl'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/admin/Work/mars/t1.py", line 15, in <module>
    df2 = md.read_ray_dataset(ds)
  File "/home/admin/Work/mars/mars/dataframe/datasource/read_raydataset.py", line 129, in read_ray_dataset
    dtypes = schema.empty_table().to_pandas().dtypes
AttributeError: 'PandasBlockSchema' object has no attribute 'empty_table'

A clear and concise description of what the bug is.

To Reproduce To help us reproducing this bug, please provide information below:

  1. Your Python version
  2. The version of Mars you use
  3. Versions of crucial packages, such as numpy, scipy and pandas Ray == 2.1.0
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.