uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

[WIP][ML-10118] Keep petastorm dataset/dataloader schema fields order the same with spark dataframe #511

Closed WeichenXu123 closed 4 years ago

WeichenXu123 commented 4 years ago

[WIP]Keep petastorm dataset/dataloader schema fields order the same with spark dataframe.

WeichenXu123 commented 4 years ago

@mengxr Because there's this PR https://github.com/uber/petastorm/pull/512 we don't need add a selected_field into TransformSpec. We can directly infer schema from transformed result(and schema field order will be the field order of the transformed result pandas dataframe). Create a new PR https://github.com/uber/petastorm/pull/513