Closed JiahaoYao closed 2 years ago
Here is the minimal script @xwjiang2010
data = np.random.randn(5, 2)
data_list = []
for i in range(5):
data_list.append(data[i, :])
data = pd.DataFrame({'image': data_list})
print(data['image'].shape)
print(data['image'][0].shape)
data = pd.DataFrame(data_list, columns=["image"])
print(data['image'].shape)
print(data['image'][0].sha
The results
(5,)
(2,)
Traceback (most recent call last):
File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 906, in _finalize_columns_and_data
columns = _validate_or_indexify_columns(contents, columns)
File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 955, in _validate_or_indexify_columns
f"{len(columns)} columns passed, passed data had "
AssertionError: 1 columns passed, passed data had 2 columns
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test_pandas.py", line 26, in <module>
data = pd.DataFrame(data_list, columns=["image"])
File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/frame.py", line 700, in __init__
dtype,
File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 483, in nested_data_to_arrays
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 807, in to_arrays
content, columns = _finalize_columns_and_data(arr, columns, dtype)
File "/Users/jimmy/opt/anaconda3/envs/dl/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 909, in _finalize_columns_and_data
raise ValueError(err) from err
ValueError: 1 columns passed, passed data had 2 columns
What happened + What you expected to happen
In order to support the predictor with the image as the input and the image as the output, the predictor might need to be more general with the shape. The changes might be two-fold:
data.values
might need to change. My current hard-code would be2.output: some tasks of the deep learning like generative networks usually outputs the images, which is not the scalar but tensors. In order to successfully output this, here is the hardcode:
Note: https://github.com/ray-project/ray/blob/9dd30d5f77d4cdbe6b13727deb83b47a10efff7a/python/ray/ml/predictors/integrations/tensorflow/tensorflow_predictor.py#L150 flattens the output, this is good for scalar, but not general for images.
Also,
pd.DataFrame(prediction, columns=["predictions"])
seems better to change topd.DataFrame({"predictions": prediction})
. The former only supports one column and the latter can be more general.Issue Severity
Medium: It is a significant difficulty but I can work around it.