tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.github.io/tfx/
Apache License 2.0
2.11k stars 709 forks source link

TFX 1.14 docker image pip broken #6368

Closed IzakMaraisTAL closed 11 months ago

IzakMaraisTAL commented 1 year ago

System information

Describe the current behavior

Pip in the TFX 1.14.0 Docker image uses the wrong python version.

This breaks our workflow of using this image as a base image and adding packages with pip, since the installed packages end up in the wrong environment. It also breaks pip list, which is needed to install compatible dependencies. This workflow is crucial to running custom components on vertex. Also for running standard components, like Transform with new dependencies like tensorflow_text.

It also breaks tfx compile --engine=kubeflow|vertex inside the container, since the tfx cli uses subprocess to call out to pip to check that kfp is installed.

Describe the expected behavior

Pip in the docker container should install / reference the default python version used in the container.

Standalone code to reproduce the issue

Start with looking at the pip list output. In 1.14 it is missing a lot of dependencies:

docker run --rm -it --entrypoint /bin/bash tensorflow/tfx:1.14.0 -c "pip list"
Package            Version
------------------ -------------
blinker            1.4
cryptography       3.4.8
dbus-python        1.2.18
distro             1.7.0
httplib2           0.20.2
importlib-metadata 4.6.4
jeepney            0.7.1
keyring            23.5.0
launchpadlib       1.10.16
lazr.restfulclient 0.14.4
lazr.uri           1.0.6
more-itertools     8.10.0
oauthlib           3.2.0
pip                22.0.2
PyGObject          3.42.1
PyJWT              2.3.0
pyparsing          2.4.7
python-apt         2.4.0+ubuntu2
python-snappy      0.5.3
SecretStorage      3.3.1
setuptools         59.6.0
six                1.16.0
tensorrt           8.6.1
wadllib            1.3.6
wheel              0.37.1

In 1.13 these where correctly listed:

docker run --rm -it --entrypoint /bin/bash tensorflow/tfx:1.13.0 -c "pip list"
Package                         Version
------------------------------- --------------------
absl-py                         1.4.0
anyio                           3.6.2
apache-beam                     2.46.0
argon2-cffi                     21.3.0
argon2-cffi-bindings            21.2.0
array-record                    0.2.0
arrow                           1.2.3
astunparse                      1.6.3
attrs                           21.4.0
backcall                        0.2.0
beautifulsoup4                  4.12.2
bleach                          6.0.0
cachetools                      4.2.4
certifi                         2019.11.28
cffi                            1.15.1
chardet                         3.0.4
charset-normalizer              3.1.0
click                           8.1.3
cloudpickle                     2.2.1
comm                            0.1.3
crcmod                          1.7
dbus-python                     1.2.16
debugpy                         1.6.7
decorator                       5.1.1
defusedxml                      0.7.1
dill                            0.3.1.1
dm-tree                         0.1.8
docker                          4.4.4
docopt                          0.6.2
etils                           1.2.0
fastavro                        1.7.3
fasteners                       0.18
fastjsonschema                  2.16.3
flatbuffers                     23.3.3
fqdn                            1.5.1
gast                            0.4.0
google-api-core                 1.32.0
google-api-python-client        1.12.11
google-apitools                 0.5.31
google-auth                     1.35.0
google-auth-httplib2            0.1.0
google-auth-oauthlib            0.5.3
google-cloud-aiplatform         1.17.1
google-cloud-bigquery           2.34.4
google-cloud-bigquery-storage   2.16.2
google-cloud-bigtable           1.7.3
google-cloud-core               2.3.2
google-cloud-datastore          1.15.5
google-cloud-dlp                3.9.2
google-cloud-language           1.3.2
google-cloud-pubsub             2.13.11
google-cloud-pubsublite         1.6.0
google-cloud-recommendations-ai 0.7.1
google-cloud-resource-manager   1.6.3
google-cloud-spanner            3.26.0
google-cloud-storage            2.8.0
google-cloud-videointelligence  1.16.3
google-cloud-vision             3.1.4
google-crc32c                   1.5.0
google-pasta                    0.2.0
google-resumable-media          2.5.0
googleapis-common-protos        1.59.0
grpc-google-iam-v1              0.12.6
grpcio                          1.54.0
grpcio-status                   1.48.2
h5py                            3.8.0
hdfs                            2.7.0
httplib2                        0.21.0
idna                            2.8
importlib-metadata              6.6.0
importlib-resources             5.12.0
ipykernel                       6.22.0
ipython                         7.34.0
ipython-genutils                0.2.0
ipywidgets                      7.7.5
isoduration                     20.11.0
jax                             0.4.8
jedi                            0.18.2
Jinja2                          3.1.2
joblib                          1.2.0
jsonpointer                     2.3
jsonschema                      4.17.3
jupyter-client                  8.2.0
jupyter-core                    5.3.0
jupyter-events                  0.6.3
jupyter-server                  2.5.0
jupyter-server-terminals        0.4.4
jupyterlab-pygments             0.2.2
jupyterlab-widgets              1.1.4
keras                           2.12.0
keras-tuner                     1.3.5
kfp-pipeline-spec               0.1.16
kt-legacy                       1.0.5
kubernetes                      12.0.1
libclang                        16.0.0
Markdown                        3.4.3
MarkupSafe                      2.1.2
matplotlib-inline               0.1.6
mistune                         2.0.5
ml-dtypes                       0.1.0
ml-metadata                     1.13.1
ml-pipelines-sdk                1.13.0
mmh                             2.2
nbclassic                       0.5.6
nbclient                        0.7.4
nbconvert                       7.3.1
nbformat                        5.8.0
nest-asyncio                    1.5.6
notebook                        6.5.4
notebook-shim                   0.2.3
numpy                           1.22.4
oauth2client                    4.1.3
oauthlib                        3.2.2
objsize                         0.6.1
opt-einsum                      3.3.0
orjson                          3.8.11
overrides                       6.5.0
packaging                       20.9
pandas                          1.5.3
pandocfilters                   1.5.0
parso                           0.8.3
pexpect                         4.8.0
pickleshare                     0.7.5
pip                             20.0.2
pkgutil-resolve-name            1.3.10
platformdirs                    3.5.0
portpicker                      1.5.2
prometheus-client               0.16.0
promise                         2.3
prompt-toolkit                  3.0.38
proto-plus                      1.22.2
protobuf                        3.20.3
psutil                          5.9.5
ptyprocess                      0.7.0
pyarrow                         6.0.1
pyasn1                          0.5.0
pyasn1-modules                  0.3.0
pycparser                       2.21
pydot                           1.4.2
pyfarmhash                      0.3.2
Pygments                        2.15.1
PyGObject                       3.36.0
pymongo                         3.13.0
pyparsing                       3.0.9
pyrsistent                      0.19.3
python-apt                      2.0.1+ubuntu0.20.4.1
python-dateutil                 2.8.2
python-json-logger              2.0.7
python-snappy                   0.5.3
pytz                            2023.3
PyYAML                          5.4.1
pyzmq                           25.0.2
regex                           2023.5.5
requests                        2.29.0
requests-oauthlib               1.3.1
requests-unixsocket             0.2.0
rfc3339-validator               0.1.4
rfc3986-validator               0.1.1
rsa                             4.9
scipy                           1.10.1
Send2Trash                      1.8.2
setuptools                      45.2.0
six                             1.14.0
sniffio                         1.3.0
soupsieve                       2.4.1
sqlparse                        0.4.4
tensorboard                     2.12.3
tensorboard-data-server         0.7.0
tensorflow                      2.12.0
tensorflow-cloud                0.1.16
tensorflow-data-validation      1.13.0
tensorflow-datasets             4.9.2
tensorflow-estimator            2.12.0
tensorflow-hub                  0.12.0
tensorflow-io                   0.24.0
tensorflow-io-gcs-filesystem    0.24.0
tensorflow-metadata             1.13.1
tensorflow-model-analysis       0.44.0
tensorflow-serving-api          2.12.1
tensorflow-transform            1.13.0
tensorrt                        8.6.0
termcolor                       2.3.0
terminado                       0.17.1
tfx                             1.13.0
tfx-bsl                         1.13.0
tinycss2                        1.2.1
toml                            0.10.2
tornado                         6.3.1
tqdm                            4.65.0
traitlets                       5.9.0
typing-extensions               4.5.0
uri-template                    1.2.0
uritemplate                     3.0.1
urllib3                         1.25.8
wcwidth                         0.2.6
webcolors                       1.13
webencodings                    0.5.1
websocket-client                1.5.1
Werkzeug                        2.3.3
wheel                           0.34.2
widgetsnbextension              3.6.4
wrapt                           1.14.1
zipp                            3.15.0
zstandard                       0.21.0

We see this is because the default pip references the wrong python envronment. The default python is 3.8:

docker run --rm -it --entrypoint /bin/bash tensorflow/tfx:1.14.0 -c "python --version"
Python 3.8.18

But pip references 3.10:

docker run --rm -it --entrypoint /bin/bash tensorflow/tfx:1.14.0 -c "pip --version"
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

In the 1.13 image, these are the same:

docker run --rm -it --entrypoint /bin/bash tensorflow/tfx:1.13.0 -c "python --version"
Python 3.8.10
docker run --rm -it --entrypoint /bin/bash tensorflow/tfx:1.13.0 -c "pip --version"
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)

Possible Workaround

In our Dockerfile we work around this by patching the pip script to reference the correct python:

FROM tensorflow/tfx:1.14.0
WORKDIR /pipeline

# Fix pip in the tfx:1.14.0 base image to use the correct python environment (3.8, not 3.10)
RUN sed -i 's/python3/python/g' /usr/bin/pip
singhniraj08 commented 1 year ago

@IzakMaraisTAL,

Thank you for reporting this issue. I was able to replicate different versions in default python and pip environment in TFX 1.14 and TFX 1.15 dev images.

@roseayeon, Different python version in default environment(py 3.8) and pip environment(py 3.10) is causing this issue. Please have a look. Thanks

singhniraj08 commented 12 months ago

@IzakMaraisTAL,

Can you try using the latest TFX image. We have updated the docker image and recompiled and re-uploaded the newly-built docker image (from v1.14.0) to the DockerHub with "latest" tag. Ref: screenshot below

Please try out the latest image and let us know if you face any issues. Thank you!

image

ruan-takealot commented 12 months ago

On a side note... I imagine the image can be made smaller by removing one of the python versions from the image?

$ docker run --rm -it --entrypoint= tensorflow/tfx:latest bash -c '
python_A=$(python -V)
python_B=$(python3 -V)
conda init bash &> /dev/null
. /root/.bashrc
conda activate base &> /dev/null
python_C=$(python -V)
python_D=$(python3 -V)
echo python_A=$python_A
echo python_B=$python_B
echo python_C=$python_C
echo python_D=$python_D

result:

python_A=Python 3.8.18
python_B=Python 3.10.12
python_C=Python 3.8.18
python_D=Python 3.10.12

(Notice how both 3.8 and 3.10 is present in the image)

IzakMaraisTAL commented 11 months ago

@singhniraj08, I have tested and can confirm that pip list and pip install work as expected in tensorflow/tfx:latest: dependencies are installed into the same environment used by TFX.

Thanks for the fix!

IzakMaraisTAL commented 11 months ago

On a side note... I imagine the image can be made smaller by removing one of the python versions from the image?

(Notice how both 3.8 and 3.10 is present in the image)

Since image size is contributing to https://github.com/tensorflow/tfx/issues/6386, this might worth considering.

singhniraj08 commented 11 months ago

@IzakMaraisTAL,

Thank you for the confirmation and feedback on TFX docker image size issue. I will pass this feedback internally to the team we will be working on the fix.

Since this issue is fixed for you, closing this issue. Please feel free to reopen and post your comments(if you still have queries on this). Thank you!

github-actions[bot] commented 11 months ago

Are you satisfied with the resolution of your issue? Yes No

Saoussen-CH commented 9 months ago

Hi the problem persists with the gcr.io/tfx-oss-public/tfx:1.14.0 image provided in Container Registery @singhniraj08

singhniraj08 commented 9 months ago

@Saoussen-CH,

The fix for this issue was introduced in tensorflow/tfx:latest image and will persist in tfx:1.14.0 image. Please try using tensorflow/tfx:latest image and let us know if you face any issues. Thank yoU!

Saoussen-CH commented 9 months ago

Hey, @singhniraj08

Yes I know, but not on the gcr.io/tfx-oss-public/tfx:latest provided in GCP as on Container Registery, which I am guessing was based on the old version of the tensorflow/tfx:latest before the fix.

I am just letting you know, so you can pass the message internally and get it fixed.

Thank you.

singhniraj08 commented 9 months ago

@Saoussen-CH, Thank you for bringing this up. I raised a PR #6401 with fix which is already merged. This should reflect in upcoming TFX release image. Meanwhile if this is blocking you, you can build a image from Dockerfile by following instructions as shown here.