sql-machine-learning / elasticdl

Kubernetes-native Deep Learning Framework
https://elasticdl.org
MIT License
731 stars 113 forks source link

dtype of column in odps table may be not float32 #1302

Closed workingloong closed 5 years ago

workingloong commented 5 years ago

The dtype of column in odps table may be int32, float, boolean, string and so on. So, the records_output_types in ODPSDataReader can not be fixed to tf.float32. https://github.com/sql-machine-learning/elasticdl/blob/aef9d66e5d99ed2144ed4ba36933aaf85738a327/elasticdl/python/data/data_reader.py#L143-L144

I suggest that, records_output_types in ODPSDataReader is fixed to tf.string. And, we should convert data from odps_io.ODPSReader.read_batch to string. https://github.com/sql-machine-learning/elasticdl/blob/aef9d66e5d99ed2144ed4ba36933aaf85738a327/elasticdl/python/data/odps_io.py#L223-L225

batch_record.append(
    [str(record[column]) for column in columns]
)

Then, user can cast the string to data type they want in defined dataset_fn

terrytangyuan commented 5 years ago

@workingloong Yes, float32 was only the temporary plan. We should switch to something more robust.