running tensorflow executor replicas under the same strategy

nick-konovalchuk commented 2 years ago

link to my sandbox: https://github.com/bottledmind/jina-issues Describe the feature If you're creating TensorFlow models under the same strategy, their GPU memory usage is significantly reduced (app2.py from the sandbox). Unfortunately, creating, say, 5 replicas of an executor with the same TensorFlow model in them results in x5 GPU memory consumption (app.py from the sandbox).

Your proposal If running the replicas under the same strategy isn't possible at the moment, it would be a nice feature

Environment

Screenshots Screenshots of watch nvidia-smi app.py: app2.py

nick-konovalchuk commented 2 years ago

(tensorflow:latest-gpu container)

- jina 3.3.15
- docarray 0.13.7
- jina-proto 0.1.8
- jina-vcs-tag (unset)
- protobuf 3.19.4
- proto-backend cpp
- grpcio 1.43.0
- pyyaml 6.0
- python 3.8.10
- platform Linux
- platform-release 4.15.0-173-generic
- platform-version #182-Ubuntu SMP Fri Mar 18 15:53:46 UTC 2022
- architecture x86_64
- processor x86_64
- uid 2485377892355
- session-id 32f4bd9e-cc94-11ec-9762-0242ac110003
- uptime 2022-05-05T16:55:42.277999
- ci-vendor (unset)
* JINA_DEFAULT_HOST (unset)
* JINA_DEFAULT_TIMEOUT_CTRL (unset)
* JINA_DEFAULT_WORKSPACE_BASE /root/.jina/executor-workspace
* JINA_DEPLOYMENT_NAME (unset)
* JINA_DISABLE_UVLOOP (unset)
* JINA_FULL_CLI (unset)
* JINA_GATEWAY_IMAGE (unset)
* JINA_GRPC_RECV_BYTES (unset)
* JINA_GRPC_SEND_BYTES (unset)
* JINA_HUBBLE_REGISTRY (unset)
* JINA_HUB_CACHE_DIR (unset)
* JINA_HUB_NO_IMAGE_REBUILD (unset)
* JINA_HUB_ROOT (unset)
* JINA_LOG_CONFIG (unset)
* JINA_LOG_LEVEL (unset)
* JINA_LOG_NO_COLOR (unset)
* JINA_MP_START_METHOD (unset)
* JINA_RANDOM_PORT_MAX (unset)
* JINA_RANDOM_PORT_MIN (unset)
* JINA_VCS_VERSION (unset)

JoanFM commented 2 years ago

But these Models live in the same process, while for replicas they would live in different processer, or even different containers or machines.

I really doubt this feature can be implemented

jina-ai / serve

running tensorflow executor replicas under the same strategy #4755