Open eirikaso opened 3 years ago
I'm now able to run training in the docker container after switching to tensorflow:2.4.1-gpu as dockerfile base. tensorflow:2.4.1 still gets installed when installing the python requirements though.
I also had to install opencv using apt as tensorflow was unable to find the version already installed in image by pip
apt-get install -y python-opencv
Something is buggy but at least I'm able to train now
Output before install opencv:
Traceback (most recent call last):
File "object_detection/model_main_tf2.py", line 32, in <module>
from object_detection import model_lib_v2
File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 29, in <module>
from object_detection import eval_util
File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/eval_util.py", line 36, in <module>
from object_detection.metrics import lvis_evaluation
File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/metrics/lvis_evaluation.py", line 23, in <module>
from lvis import results as lvis_results
File "/home/tensorflow/.local/lib/python3.6/site-packages/lvis/__init__.py", line 5, in <module>
from lvis.vis import LVISVis
File "/home/tensorflow/.local/lib/python3.6/site-packages/lvis/vis.py", line 1, in <module>
import cv2
File "/home/tensorflow/.local/lib/python3.6/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
I'm now able to run training in the docker container after switching to tensorflow:2.4.1-gpu as dockerfile base. tensorflow:2.4.1 still gets installed when installing the python requirements though.
I also had to install opencv using apt as tensorflow was unable to find the version already installed in image by pip
apt-get install -y python-opencv
Something is buggy but at least I'm able to train now
Output before install opencv:
Traceback (most recent call last): File "object_detection/model_main_tf2.py", line 32, in <module> from object_detection import model_lib_v2 File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 29, in <module> from object_detection import eval_util File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/eval_util.py", line 36, in <module> from object_detection.metrics import lvis_evaluation File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/metrics/lvis_evaluation.py", line 23, in <module> from lvis import results as lvis_results File "/home/tensorflow/.local/lib/python3.6/site-packages/lvis/__init__.py", line 5, in <module> from lvis.vis import LVISVis File "/home/tensorflow/.local/lib/python3.6/site-packages/lvis/vis.py", line 1, in <module> import cv2 File "/home/tensorflow/.local/lib/python3.6/site-packages/cv2/__init__.py", line 5, in <module> from .cv2 import * ImportError: libGL.so.1: cannot open shared object file: No such file or directory
It's not necessary to install the python-opencv using apt after all. It's sufficient to install
apt-get install libgl1-mesa-glx
training now runs on GPU. Both tensorflow and tensorflow-gpu is installed in the container
Thanks @eirikaso
I have the same issue, in od container:
$ pip uninstall tensorflow==2.4.1
to remove the duplicated.
Hey @eirikaso! I recently ran into the same issue - quite annoying as it unnecessarily increases the size of the container and lengthens the build time.
The problem is that the object detection package doesn't recognise tensorflow-gpu (the package installed in the base Docker image) as a valid version of tensorflow, so it attempts to install it. Seems like the official Docker image has a GPU specific version of tensorflow, a property I thought was only true with TF 1.0.
A workaround is to create a symbolic link to trick pip into thinking tensorflow is already installed by adding this line to your Dockerfile:
WORKDIR /usr/local/lib/python3.6/dist-packages
RUN ln -s tensorflow_gpu-* tensorflow-$(ls -d1 tensorflow_gpu* | sed 's/tensorflow_gpu-\(.*\)/\1/')
Then, when you come to upgrading your base version of tensorflow, simply change the base image and rebuild the container without worrying about having two versions of tensorflow installed.
Here's my complete Dockerfile (works for me!):
## Custom Dockerfile for the Tensorflow Object Detection API ##
FROM tensorflow/tensorflow:2.4.1-gpu
RUN python -c "import tensorflow as tf; print(f'Tensorflow version: {tf.__version__}')"
ARG DEBIAN_FRONTEND=noninteractive
# Install apt-get dependencies
RUN apt-get update && apt-get install -y \
python3-tk \
libgl1-mesa-glx && rm -rf /var/lib/apt/lists/*
# Name the symlink with the suffix from tensorflow-gpu (see question 65098672: stackoverflow.com)
WORKDIR /usr/local/lib/python3.6/dist-packages
RUN ln -s tensorflow_gpu-* tensorflow-$(ls -d1 tensorflow_gpu* | sed 's/tensorflow_gpu-\(.*\)/\1/')
# Install protobuf
RUN curl -L -O https://github.com/protocolbuffers/protobuf/releases/download/v3.11.4/protoc-3.11.4-linux-x86_64.zip && \
unzip protoc-3.11.4-linux-x86_64.zip && \
cp bin/protoc /usr/local/bin && \
rm -r protoc-3.11.4-linux-x86_64.zip bin/
# Copy our local version of models into the image
WORKDIR home/tf
COPY . /home/tf/models
# Compile the protocol buffers for Python
RUN (cd /home/tf/models/research/ && protoc object_detection/protos/*.proto --python_out=.)
# Install the Object Detection API
WORKDIR /home/tf/models/research/
RUN cp object_detection/packages/tf2/setup.py .
RUN python -m pip install --upgrade pip
RUN python -m pip install .
# Confirm tensorflow hasn't been reinstalled
RUN python -c "import tensorflow as tf; print(f'Tensorflow version: {tf.__version__}')"
# Add models to our python path
ENV PYTHONPATH="/home/tf/models:$PYTHONPATH"
this is very helpful. I had the same issue just by import tensorflow inside docker (undefined symbol: _ZN10tensorflow8OpKernel11TraceStringEPNS_15OpKernelContextEb), and I see there are both tensorflow 2.6.0 and tensorflow-gpu 2.4.1 (I changed the base image in docker file to 2.4.1). I wonder where it says to install 2.6.0, just find the latest in tensorflow?
The latest version seems to be installed as a dependency when the different python packages are being installed from a requirements.txt file. Since versions are not specified in this file, I think one of the packages depends on tensorflow, so it installs the latest package
As for the workaround, I was able to align the tensorflow-gpu and tensorflow versions by rewriting the Dockerfile as shown below. The object_detection library should be versioned for each TensorFlow version as well as the tf_models_official library, IMO.
FROM tensorflow/tensorflow:2.5.0-gpu
ARG DEBIAN_FRONTEND=noninteractive
# Install apt dependencies
RUN apt-get update && apt-get install -y \
git \
gpg-agent \
python3-cairocffi \
protobuf-compiler \
python3-pil \
python3-lxml \
python3-tk \
wget
# Add new user to avoid running as root
RUN useradd -ms /bin/bash tensorflow
USER tensorflow
WORKDIR /home/tensorflow
# Clone Object Detection API
RUN git clone https://github.com/tensorflow/models/ /home/tensorflow/models/
# Workaround: If you use TF 2.2.x, uncomment the line below.
# WORKDIR /home/tensorflow/models/
# RUN git checkout 03a6d6c8e79b426231a4d5ba0cf45be9afc8bad5
# Workaround: If you use TF 2.3.x, uncomment the line below.
# WORKDIR /home/tensorflow/models/
# RUN git checkout cf82a72480a41a62b4bbe0f1378d319f0d6f5d5c
# Compile protobuf configs
RUN (cd /home/tensorflow/models/research/ && protoc object_detection/protos/*.proto --python_out=.)
WORKDIR /home/tensorflow/models/research/
RUN cp object_detection/packages/tf2/setup.py ./
ENV PATH="/home/tensorflow/.local/bin:${PATH}"
# Workaround (For Tensorflow < 2.5.1): Remove tf-models-official dependency from object_detection, will install it manually.
RUN sed -i -e 's/^.*tf-models-official.*$//g' ./setup.py
RUN python -m pip install -U pip
# Workaround: Lock tensorflow and corresponding tf-models-official versions.
RUN python -m pip install tensorflow==2.5.0 tensorflow-text==2.5.0 tf-models-official==2.5.0
RUN python -m pip install .
ENV TF_CPP_MIN_LOG_LEVEL 3
The changes are as follows.
RUN python -m pip install -U pip
+
+ # Workaround: install tensorflow and corresponding tf-models-official versions.
+ RUN python -m pip install tensorflow==2.5.0 tensorflow-text==2.5.0 tf-models-official==2.5.0
RUN python -m pip install .
ENV TF_CPP_MIN_LOG_LEVEL 3
tf-models-official never pin tensorflow version. tensorflow/tensorflow:x.x.x-gpu does not include the tensorflow library by default. Therefore, if no version of tensorflow is specified, the latest version will be installed.
ENV PATH="/home/tensorflow/.local/bin:${PATH}"
+ # Workaround (For tensorflow < 2.5.1): Remove tf-models-official dependency from object_detection, will install it manually.
+ RUN sed -i -e 's/^.*tf-models-official.*$//g' ./setup.py
RUN python -m pip install -U pip
# Clone Object Detection API
RUN git clone https://github.com/tensorflow/models/ /home/tensorflow/models/
+
+ # Workaround: If you use TF 2.2.x, uncomment the line below.
+ # WORKDIR /home/tensorflow/models/
+ # RUN git checkout 03a6d6c8e79b426231a4d5ba0cf45be9afc8bad5
+
+ # Workaround: If you use TF 2.3.x, uncomment the line below.
+ # WORKDIR /home/tensorflow/models/
+ # RUN git checkout cf82a72480a41a62b4bbe0f1378d319f0d6f5d5c
# Compile protobuf configs
$ pip list
...
object-detection 0.1
...
tensorboard 2.5.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow 2.5.0
tensorflow-addons 0.14.0
tensorflow-datasets 4.4.0
tensorflow-estimator 2.5.0rc0
tensorflow-hub 0.12.0
tensorflow-metadata 1.2.0
tensorflow-model-optimization 0.6.0
tensorflow-text 2.5.0
termcolor 1.1.0
text-unidecode 1.3
tf-models-official 2.5.0
...
Hello, @Niccari's solution works fine on my side! Thanks for your help. Last thing I would need is to freeze the version of the cloned repository tensorflow/model. To avoid update on the master branch that may break the installation. In this line :
RUN git clone https://github.com/tensorflow/models/ /home/tensorflow/models/
The best thing would be something like:
RUN git clone --branch v2.5.0 --depth 1 https://github.com/tensorflow/models/ /home/tensorflow/models/
BUT research directory under models are removed from the released versions (see the last commit "Removing research/community models" version 2.5.0 or 2.4.0). I tried to recover from a previous commit on these branches, but it contains a really old version of research directory, around 15 months old and of course, this old version of object detection does not support TF2 (see for example this commit).
Is there a way to achieve this?
Thanks a lot.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/238922e98dd0e8254b5c0921b241a1f5a151782f/research/object_detection/dockerfiles/tf2/Dockerfile
2. Describe the bug
Some of the python requirements that gets installed from the "/models/research/object_detection/packages/tf2/setup.py" leads to the installation of the latest tensorflow version (2.4.1). Since the dockerfile is building with the tensorflow/tensorflow:2.2.0-gpu image as starting point, I now have tensorflow:2.2.0-gpu AND tensorflow:2.4.1 installed.
When I try to train on a network I get the following output:
python object_detection/model_main_tf2.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --alsologtostderr
3. Steps to reproduce
Clone the "models" repository and install using docker. Follow installation instructions here: https://github.com/tensorflow/models/blob/238922e98dd0e8254b5c0921b241a1f5a151782f/research/object_detection/g3doc/tf2.md
Start training on a network: https://github.com/tensorflow/models/blob/238922e98dd0e8254b5c0921b241a1f5a151782f/research/object_detection/g3doc/tf2_training_and_evaluation.md
4. Expected behavior
I want to use the tensorflow:2.2.0-gpu or tensorflow:X.X.X-gpu. I do not want another tensorflow version to get installed during building as this screws up the environment.
5. Additional context
After the python requirements from the https://github.com/tensorflow/models/blob/238922e98dd0e8254b5c0921b241a1f5a151782f/research/object_detection/packages/tf2/setup.py file is installed while building the dockerfile, I get the following output indicating that all tensorflow related packages has been updated to version 2.4 as well.
6. System information