[Ray Air] predictor of ray air only supports of the scalar output

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Apache License 2.0

33.89k stars 5.76k forks source link

What happened + What you expected to happen

In order to support the predictor with the image as the input and the image as the output, the predictor might need to be more general with the shape. The changes might be two-fold:

input: related to #25037 , the data.values might need to change. My current hard-code would be
```
if len(feature_columns) == 1: 
data = data[:, 0]
data = np.stack(data)
```

2.output: some tasks of the deep learning like generative networks usually outputs the images, which is not the scalar but tensors. In order to successfully output this, here is the hardcode:

prediction = model(tensor).numpy()

print(prediction.shape)
prediction_list = []
for i in range(prediction.shape[0]): 
    prediction_list.append(prediction[i, :])

# what if multiple outputs for the predictor? 
return pd.DataFrame({"predictions": prediction})

Note: https://github.com/ray-project/ray/blob/9dd30d5f77d4cdbe6b13727deb83b47a10efff7a/python/ray/ml/predictors/integrations/tensorflow/tensorflow_predictor.py#L150 flattens the output, this is good for scalar, but not general for images.

Also, pd.DataFrame(prediction, columns=["predictions"]) seems better to change to pd.DataFrame({"predictions": prediction}). The former only supports one column and the latter can be more general.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

data = np.random.randn(5, 2) data_list = [] for i in range(5): data_list.append(data[i, :]) data = pd.DataFrame({'image': data_list}) print(data['image'].shape) print(data['image'][0].shape) data = pd.DataFrame(data_list, columns=["image"]) print(data['image'].shape) print(data['image'][0].sha

(5,) (2,) Traceback (most recent call last): File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 906, in _finalize_columns_and_data columns = _validate_or_indexify_columns(contents, columns) File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 955, in _validate_or_indexify_columns f"{len(columns)} columns passed, passed data had " AssertionError: 1 columns passed, passed data had 2 columns The above exception was the direct cause of the following exception: Traceback (most recent call last): File "test_pandas.py", line 26, in <module> data = pd.DataFrame(data_list, columns=["image"]) File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/frame.py", line 700, in __init__ dtype, File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 483, in nested_data_to_arrays arrays, columns = to_arrays(data, columns, dtype=dtype) File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 807, in to_arrays content, columns = _finalize_columns_and_data(arr, columns, dtype) File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 909, in _finalize_columns_and_data raise ValueError(err) from err ValueError: 1 columns passed, passed data had 2 columns

ray-project / ray

[Ray Air] predictor of ray air only supports of the scalar output #25069

What happened + What you expected to happen

Issue Severity