tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
706 stars 287 forks source link

Add Parquet ExampleGen support #240

Open yongtang opened 5 years ago

yongtang commented 5 years ago

This issue is to track the progress of adding Parquet ExampleGen support, so that it could be integrated into TFX OSS.

Related issues: https://github.com/tensorflow/tfx/issues/74

Related discussion:

There's a close relationship between the work in SIG IO and ExampleGen implementations for TFX. Currently, TFX has no particular way of maintaining contributions, and there seems like there may be significant code overlap as well with stuff in IO. I'm starting this thread to kick off a discussion about the feasibility and desirability of incorporating TFX data ingestion components into the SIG's work.

Related docs: https://github.com/tensorflow/tfx/blob/master/docs/guide/examplegen.md#custom-examplegen

yongtang commented 5 years ago

Assigned to myself.

ewilderj commented 5 years ago

the TFX team has decided to merge the Parquet PR rather than having it maintained externally. So this particular issue isn't as urgent as it was before, it might come up again in future.