remicres / otbtf

Deep learning with otb (mirror of https://forgemia.inra.fr/orfeo-toolbox/otbtf)
Apache License 2.0
161 stars 39 forks source link

Cannot register 2 metrics with same name #28

Closed daspk04 closed 1 year ago

daspk04 commented 4 years ago

Hello @remicres ,

I was trying to import OTB and TensorFlow via python. Looks like both cannot be imported or used at the same time either I have to use otb or TensorFlow separately. Is it because OTB uses the same library that is used by TensorFlow .?

As I understand I can create a separate python program to do tasks related to OTB and task related to TensorFlow and run them separately. Or should I import TensorFlow and call the OTB applications via the command line (os.subprocess).? (haven't tested this one tho)

Any suggestions.?

image

image

remicres commented 4 years ago

This reminds me of something. I encountered this issue in the early versions of otbtf. I did not really find the cause, but I did find a workaround at this time.

OTB needs an environment variable for the applications path (i.e. the installed/lib/otb/applications/* libraries files, .so under nux / .dll under windows I guess). In the beginning of OTB, I believe that the environment variable used was ITK_AUTOLOAD_PATH, but this changed later in OTB_APPLICATION_PATH. I did notice that, when I used the ITK_AUTOLOAD_PATH, the same error like yours happened. But after using only OTB_APPLICATION_PATH, I did avoid the error. I just checked with otbtf1.X, python -c "import tensorflow; import otbApplication" works fine.

I think that this error happen in otbtf2.X. Can you confirm that? I bet that if you unset OTB_APPLICATION_PATH, you are able to import both tf and otb (...but it's useless because you won't be able to use the OTB apps). I don't know what is happening: it's like tf tries to initialize twice some component.

remicres commented 4 years ago

I just give a try inside the otbtf2.0 and I can't reproduce your issue.

root@61276408e13f:/home/otbuser# python -c "import otbApplication; import tensorflow"
2020-07-17 09:58:05.757528: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
root@61276408e13f:/home/otbuser#

What is the version used?

daspk04 commented 4 years ago

I have tried this on otbtf1.7:gpu as well as otbtf2.0:gpu. This one works for me as well python -c "import tensorflow; import otbApplication". But the problem is when I call otb modules or tensorflow modules after import, it failed to work. Once it call any otb module then I cannot import TensorFlow vice versa.

Can you try these.?

python -c "import otbApplication; print(otbApplication.Registry_GetAvailableApplications()); import tensorflow; print(tensorflow.__version__)"

python -c " import tensorflow; import otbApplication; print(otbApplication.Registry_GetAvailableApplications())"

remicres commented 4 years ago

Damn, I can reproduce this bug. Indeed, in projects where I use both, I don't import tf and otb in the same .py file. I think that otbtf1.X is also impacted.

remicres commented 4 years ago

According to this tensorflow issue, building TF using --config=monolithic could fix. Here are the steps I did to manage to find a fix:

pip install 'setuptools>=41.0.0'
pip install 'numpy<1.19.0'
bazel clean
bazel build -c opt --copt=-march=native --copt=-mfpmath=both //tensorflow:libtensorflow_framework.so //tensorflow:libtensorflow_cc.so //tensorflow/tools/pip_package:build_pip_package --noincompatible_do_not_split_linking_cmdline --config=monolithic

The error message is now different... but it still doesn't work

The following:

python -c "import otbApplication; print(otbApplication.Registry_GetAvailableApplications()); import tensorflow; print(tensorflow.__version__)"

Throws:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 101, in <module>
    from tensorflow_core import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/__init__.py", line 46, in <module>
    from . _api.v2 import compat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/__init__.py", line 32, in <module>
    from . import compat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/v1/__init__.py", line 29, in <module>
    from tensorflow._api.v2.compat.v1 import app
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/__init__.py", line 32, in <module>
    from . import compat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/v1/__init__.py", line 49, in <module>
    from tensorflow._api.v2.compat.v1 import lite
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/lite/__init__.py", line 11, in <module>
    from . import experimental
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/lite/experimental/__init__.py", line 10, in <module>
    from . import nn
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/lite/experimental/nn/__init__.py", line 10, in <module>
    from tensorflow.lite.python.lite import TFLiteLSTMCell
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/lite.py", line 34, in <module>
    from tensorflow.lite.experimental.microfrontend.python.ops import audio_microfrontend_op  # pylint: disable=unused-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/experimental/microfrontend/python/ops/audio_microfrontend_op.py", line 30, in <module>
    resource_loader.get_path_to_datafile("_audio_microfrontend_op.so"))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/load_library.py", line 57, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.AlreadyExistsError: Op with name _Arg

And:

python -c "import tensorflow; import otbApplication; print(otbApplication.Registry_GetAvailableApplications())"

Results in:

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:118] File already exists in database: google/protobuf/any.proto
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1367] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
lifeiteng commented 3 years ago

any progress? I meet same error https://github.com/deepmind/reverb/pull/24

remicres commented 3 years ago

Hi @lifeiteng , unfortunately not yet

lifeiteng commented 3 years ago

Hi @lifeiteng , unfortunately not yet

I have fixed it. Check the link options.

remicres commented 3 years ago

Many thanks!

I am not sure to fully understand your fix though... I don't see what you do with the PYTHON_LIB_PATH, could you please detail a bit?

lifeiteng commented 3 years ago

Many thanks!

I am not sure to fully understand your fix though... I don't see what you do with the PYTHON_LIB_PATH, could you please detail a bit?

I will push the code ASAP.

lifeiteng commented 3 years ago

@remicres https://github.com/deepmind/reverb/pull/24/commits/1463007c84fc0a783b04e4f2bec3c7d4e1bdcabf FYI

vidlb commented 3 years ago

@remicres @Pratyush1991 I found something interesting : On my system I have /opt/otb and /opt/tensorflow . I can import otb (which I built against libtensorflow_cc ) without my TF env variables (PATH and LD_LIBRARY_PATH).

I tried the test command : import otbApplication ; print(otbApplication.Registry_GetAvailableApplications()) 2021-01-18 20:09:44.499243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 So even if I don't import tensorflow, it seems the TF lib is loaded anyway by OTB when I access any of the OTBTF app (but not with the other apps). So when we import both otbApplication and tensorflow, it is loaded twice in memory, thus the protobuf error.

remicres commented 3 years ago

Yes, it is something like that. I guess that the c++ tensorflow classes used in otb applications "triggers" something that is done twice if we also import tensoflow in python. I did not quite understood the fix that @lifeiteng did in deepmind/reverd, but it looks like it is a matter of the way to link tf libs...

vidlb commented 3 years ago

The error looks the same but I don't think the problem is. I believe their issue was some executable in the project being linked to tensorflow while it wasn't required. I guess they fixed just by removing unnecessary links. But since you need those apps linked to tf anyway, may be there's no solution: we just can't import both tf and OTBTF apps in the same process...

vidlb commented 3 years ago

It would probably require to modify the TF core in order to avoid re-loading the lib if it was already loaded by OTB in the same python thread...

remicres commented 3 years ago

For now, we can't import both, but that's a bit sad. It is like you couldn't import otbApplication and gdal in the same python code :sob:

remicres commented 3 years ago

I am reading this. Do we have a way to know which version of protobuf is used from the OTB applications, and from the import tensorflow in python?

vidlb commented 3 years ago

I also saw it yesterday. In the beginning I thought that could be the problem, if OTB was already using another protobuf version (because of OpenCV may be ?). But since we can import both tensorflow and any other otbApp (which isn't linked to tensorflow) without the protobuf error, I really think it is just because tf is loaded twice !

For the record TF2.4 is built with protobuf 3.14

vidlb commented 3 years ago

May be https://github.com/tensorflow/tensorflow/issues/22810 , they're talking about building plugins with libtensorflow_cc. They mentioned static linking, it seems hard to build, and I guess that's not an option with the OTB module architecture...

remicres commented 3 years ago

They mentioned static linking, it seems hard to build, and I guess that's not an option with the OTB module architecture...

I don't think that it is a limitation, but this would imply some cmake/c++ magic

vidlb commented 3 years ago

Do you think it could work ? There's probably no way to be sure without trying... But the error doesn't imply protobuf.

Also yesterday I tried something : After I realized libtensorflow_framework.so is already installed in the wheel, I thought this could be a reason if there are two libs and OTB did not compile with the same framework lib file. So in a -dev container I tried to recompile with the file from the wheel? Nope, same error. I guess if the file are the same this doesn't change anything where it is located. I don't understand this linking thing very well.

But regarding the docker build may be we can remove this target from BZL_TARGETS and use the one installed in site-packages/ since it seems it is built by default with the build_pip_package config.

vidlb commented 3 years ago

May be the problem is that you just can't import tensorflow C and C++ instances in the same python process ? I don't know how it works in memory when otbApplication is loaded, but libtensorflow.so is loaded with tf from python, and OTB will try to load libtensorflow_cc.so, may be somehow bazel treat both libs the same way ? So the File already exists in database would be the result of protobuf trying to create two instances of tensorflow using different libs, and/or just because both are named tensorflow in memory, even with a _cc suffix the namespace is still tensorflow...

vidlb commented 3 years ago

The best "quick and dirty" fix for python scripts could imply subprocess or multiprocessing

vidlb commented 3 years ago

@Pratyush1991 I don't know if that will help you a lot but in fact you can import both tensorflow and every otb applications + PatchesSelection, PatchesExtraction, LabelImageSampleSelection and DensePolygonClassStatistics in the same python script without the protobuf error. It should occur only when using the TensorFlowModel and ClassifierFromDeepFeatures applications. And you can't list every apps because it will load the tf framework. But since you're probably going to write your image and save your model to disk anyway, it should work with a python script where you just run those things related to tf models with a suprocess, or you'll need 2 scripts. You just can't import tensorflow if you're going to use it also via otbApplication in the same thread.

remicres commented 3 years ago

Nearly same issue here

vidlb commented 3 years ago

Wow, such a dead SO thread... Well I really think this comment is a good explanation. And it doesn't look like something you can overpass without patching the tf code... But since TF is really low level it may be just a bad idea to load it twice in memory

remicres commented 1 year ago

This 3 years old issue will be fixed in the otbtf 4.0.0 release

vidlb commented 1 year ago

So..what kind of black magic did you invoke to make this work ?

remicres commented 1 year ago

TBH I don't know if its the linking I did a bit differently, or moving to Tensorflow 2.12

remicres commented 1 year ago

closed with r4.0.0