ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.89k stars 5.76k forks source link

[Ray Air] predictor of ray air only supports of the scalar output #25069

Closed JiahaoYao closed 2 years ago

JiahaoYao commented 2 years ago

What happened + What you expected to happen

In order to support the predictor with the image as the input and the image as the output, the predictor might need to be more general with the shape. The changes might be two-fold:

  1. input: related to #25037 , the data.values might need to change. My current hard-code would be
    if len(feature_columns) == 1: 
    data = data[:, 0]
    data = np.stack(data)

2.output: some tasks of the deep learning like generative networks usually outputs the images, which is not the scalar but tensors. In order to successfully output this, here is the hardcode:

prediction = model(tensor).numpy()

print(prediction.shape)
prediction_list = []
for i in range(prediction.shape[0]): 
    prediction_list.append(prediction[i, :])

# what if multiple outputs for the predictor? 
return pd.DataFrame({"predictions": prediction})

Note: https://github.com/ray-project/ray/blob/9dd30d5f77d4cdbe6b13727deb83b47a10efff7a/python/ray/ml/predictors/integrations/tensorflow/tensorflow_predictor.py#L150 flattens the output, this is good for scalar, but not general for images.

Also, pd.DataFrame(prediction, columns=["predictions"]) seems better to change to pd.DataFrame({"predictions": prediction}). The former only supports one column and the latter can be more general.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

JiahaoYao commented 2 years ago

Here is the minimal script @xwjiang2010

data = np.random.randn(5, 2)
data_list = []
for i in range(5):
    data_list.append(data[i, :])

data = pd.DataFrame({'image': data_list})
print(data['image'].shape)
print(data['image'][0].shape)

data = pd.DataFrame(data_list, columns=["image"])
print(data['image'].shape)
print(data['image'][0].sha

The results

(5,)
(2,)
Traceback (most recent call last):
  File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 906, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 955, in _validate_or_indexify_columns
    f"{len(columns)} columns passed, passed data had "
AssertionError: 1 columns passed, passed data had 2 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test_pandas.py", line 26, in <module>
    data = pd.DataFrame(data_list, columns=["image"])
  File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/frame.py", line 700, in __init__
    dtype,
  File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 483, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 807, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 909, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 1 columns passed, passed data had 2 columns