jina-ai / jina

☁️ Build multimodal AI applications with cloud-native stack
https://docs.jina.ai
Apache License 2.0
20.87k stars 2.22k forks source link

Share files between different Executors #6015

Closed mvsoom closed 9 months ago

mvsoom commented 1 year ago

Can different Executors (possibly replicated) access the same files?

I am thinking of a design where one Executor (A) streams data continuously to a client... where A's internal state is updated by checking files being created by other Executors (B, C, ...), which are queried by other clients at random times and run in parallel with A. These Executors (A, B, C, ...) therefore need access to the same filesystem.

Is this possible?

JoanFM commented 1 year ago

This may be locally doable, but it poses some questions on how to deploy in the cloud.

Could you give more details about the system you want to design? If it is not possible to share them publicly, you can also find us on discord (https://discord.gg/AE7FCFZp) where you can also share this privately.

mvsoom commented 1 year ago

Hi Joan, thanks for the fast reply!

Here are some details about the art project.This might be a bit too much, especially on a Friday, so do let me know if it's too complicated and I will simplify further. I apologize.


IMG-2926

Client side

The client is an autonomous webcam equipped with a camera, microphone and display. It has unique UUID WEBCAM-1. It communicates with three Executors on JCloud:

  1. OpenFlamingo Executor: client sends webcam stills (images) to be processed by this Executor. Whenever the processing is done (the request returns), the client promptly sends a new webcam still.
  2. Whisper Executor: same as 1. but with recorded audio from the client instead of webcam stills.
  3. LlaMA Executor: the client receives a continuous stream of characters from this Executor per https://docs.jina.ai/tutorials/llm-serve. These characters are written to a display mounted on the autonomous webcam.

These three communication channels run independently and async.

Server side

The three Executors run as separate services with their own GPUs allocated.

  1. OpenFlamingo Executor: receives an image by the client. Performs image captioning and writes image description to a shared folder /WEBCAM-1/pictures/image-description-1691154147.text where 1691154147 is a timestamp from time.time(). Returns "OK" to client when processing is done, so a new image can be processed and its description written to /WEBCAM-1/pictures/image-description-1691154152.text. Etc.
  2. Whisper Executor: same as 1. but transcribe audio and write to /WEBCAM-1/audio/audio-transcription-1691154149.text.
  3. LlaMA Executor: predict new characters based on a prompt and stream them one by one to the client. And here comes the crucial part: whenever a new file appears in /WEBCAM-1/pictures/ or /WEBCAM-1/audio/, read it and update prompt with that information, and keep streaming with this newly updated prompt.

These three Executors run independently and async.

The complete picture

This setup tries to limit latency by avoiding blocking flows and running everything completely async. The display of the client WEBCAM-1 streams characters continuously, which incorporate information of what is happening in front of the webcam through the textual descriptions of images and audio which get written to the shared file mechanism at variable intervals.

This system also works for multiple clients WEBCAM-1, WEBCAM-2, WEBCAM-3, ... provided the above Executors can replicate themselves.

JoanFM commented 1 year ago

Hello,

This I think is doable. You can check documentation about persistence in JCloud in this section.

https://docs.jina.ai/concepts/jcloud/configuration/#storage.

It seems like a real nice project you are building.

mvsoom commented 1 year ago

That's great to hear!

The linked docs mention "If your Executor needs to share data with other Executors and retain data persistency, consider using efs". So efs would work. How exactly can different Executors (which are not replicas) access each other's workspaces? If this turns out to be impossible I might build another independent Executor which acts as a store house and takes care of these shared files.

jina-bot commented 10 months ago

@jina-ai/product This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days