tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.org/tfx
Apache License 2.0
2.12k stars 707 forks source link

[Discussion] Questions around including an image download step into TFX pipeline #3602

Closed RossKohler closed 3 years ago

RossKohler commented 3 years ago

Hi, I wonder if you could provide some guidance on how I might incorporate an image download step into my TFX pipeline. I have data where one of the features is a URL that links to an image that I would like to process in Tensorflow. I have code that is in python functions and I'm wondering how I can transfer this to a TFX pipeline where ideally this step can be run during training and serving.

So far I've had 3 ideas of how to do this, all of which seem to have their pros and cons.

  1. I base64 encode the images in a non-pipeline step and change and TensorFlow operations to decode these strings and transform them into the desired image tensors.
  1. I build a custom component that is after the CSVExampleGen (currently in my pipeline), read the Examples produced by CSVExampleGen, download the images in this new custom component, replace the image URL's with these tensors, and then proceed to use these new Examples in the rest of my pipeline.
  1. I try to get this working in the preprocessing_fn of the TransformComponent, turning all of my code in TF operations and using tft.apply_pyfunc to turn the python functions (image download) into TF functions.

What am I missing? How should I be looking at this? Any guidance from someone on the TFX team would be greatly appreciated here.

System information

RossKohler commented 3 years ago

For whoever has had a similar question around computer vision tasks I highly recommend reading the book Building Machine Learning Pipelines by Hannes Hapke, Catherine Nelson. Chapter 3 on 'Data Ingestion' and Chapter 5 on 'Data Preprocessing' talk about computer vision applications of TFX. Reading these two chapters directly answered my question above ☝️

TL;DR TF Examples in a TF Record file, you can write an image as bytes into a TF Example. This can be decoded pretty easily in a TF Graph in the preprocessing_fn of the TransformComponent. During inference, you would require that requests are made over gRPC and since TF Examples are protobufs you would require the client to just adhere to the schema when they wish to make a request.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No