pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

[QUESTION] How to deploy a Pathway AirByte streaming ETL microservice to Google Cloud run? #53

Open vikas-velora opened 1 month ago

vikas-velora commented 1 month ago

Hi,

Is there a way to deploy pathway airbyte streaming ETL microservice to Google cloud run? If yes, how to go about it?

thanks.

dxtrous commented 1 month ago

Hi @vikas-velora apologies for the slow turnaround on your question - our team is verifying if this is the case.

As a general rule, we advocate deployment from source (in this spirit: https://cloud.google.com/run/docs/deploying-source-code), and will provide the easiest recipe that works in this direction. The intended experience is something like this one with Render: https://pathway.com/developers/user-guide/deployment/render-deploy/.

vikas-velora commented 1 month ago

Got it, thanks. Will await your response.

On Tue, 21 May 2024 at 7:29 AM, Adrian Kosowski @.***> wrote:

Hi @vikas-velora https://github.com/vikas-velora apologies for the slow turnaround on your question - our team is verifying if this is the case.

As a general rule, we advocate deployment from source (in this spirit: https://cloud.google.com/run/docs/deploying-source-code), and will provide the easiest recipe that works in this direction. The intended experience is something like this one with Render: https://pathway.com/developers/user-guide/deployment/render-deploy/.

— Reply to this email directly, view it on GitHub https://github.com/pathwaycom/pathway/issues/53#issuecomment-2121563443, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHP3QYFAAUEZ6CX56ZSZ4MTZDKS7RAVCNFSM6AAAAABH4CO73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGU3DGNBUGM . You are receiving this because you were mentioned.Message ID: @.***>

vikas-velora commented 1 month ago

Adrian,

One question - if we want to dynamically deploy a service based on user inputs, would it be better to deploy on Google cloud functions? Right now, pathway works using yaml config, we want to parameterize it and take inputs like OAuth token and GitHub repo name etc from users.

Thanks, Vikas

On Tue, 21 May 2024 at 7:57 AM, Vikas Singhvi @.***> wrote:

Got it, thanks. Will await your response.

On Tue, 21 May 2024 at 7:29 AM, Adrian Kosowski @.***> wrote:

Hi @vikas-velora https://github.com/vikas-velora apologies for the slow turnaround on your question - our team is verifying if this is the case.

As a general rule, we advocate deployment from source (in this spirit: https://cloud.google.com/run/docs/deploying-source-code), and will provide the easiest recipe that works in this direction. The intended experience is something like this one with Render: https://pathway.com/developers/user-guide/deployment/render-deploy/.

— Reply to this email directly, view it on GitHub https://github.com/pathwaycom/pathway/issues/53#issuecomment-2121563443, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHP3QYFAAUEZ6CX56ZSZ4MTZDKS7RAVCNFSM6AAAAABH4CO73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGU3DGNBUGM . You are receiving this because you were mentioned.Message ID: @.***>

zxqfd555-pw commented 1 month ago

Hi Vikas,

To give a small technical heads-up, there is a way to dockerize the airbyte connector code for the local tests. Precisely, you would need to install Docker in your Dockerfile as follows:

RUN apt update && apt install docker.io -y

And then run with mounting two volumes, as follows:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp <your_image_name>

The first volume is required to enable DinD, while the second one is needed because the /tmp is currently used to store the temporary artifacts of the airbyte connector. Note that this wouldn't be so easy to deploy (and I suppose, it's impossible to deploy it at Google Cloud) because of giving access to the Docker socket.

While I've also tried to use the docker:dind image as a base, I've also figured out that it's unusable for our case because of using Alpine Linux as the base for docker:dind which is not supported by Pathway yet. Thus, I think we need to do something different and implement running the airbyte connector without depending on Docker, in GCP. It would need to be done for the Pathway framework.

So, to wrap it up, the way to go will be to run the airbyte connector in the GCP - a feature that must be added to Pathway. I am currently checking this possibility and will be back to you today or in a few days.

vikas-velora commented 1 month ago

Thanks so much @zxqfd555-pw . We tried multiple ways, and were unable to deploy - at least this confirms that it was not something to do with our knowledge 😊. Will wait for your update.

zxqfd555-pw commented 1 month ago

Hi Vikas!

A quick heads-up: we can eliminate the need for the DinD technique for airbyte connectors by introducing a mode where they run as GCP jobs. I am in the process of implementing it, and we can release the corresponding update next week.

vikas-velora commented 1 month ago

Thanks for the update.

On Fri, 24 May 2024 at 3:42 PM, Sergey Kulik @.***> wrote:

Hi Vikas!

A quick heads-up: we can eliminate the need for the DinD technique for airbyte connectors by introducing a mode where they run as GCP jobs. I am in the process of implementing it, and we can release the corresponding update next week.

— Reply to this email directly, view it on GitHub https://github.com/pathwaycom/pathway/issues/53#issuecomment-2129164711, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHP3QYEM6DKIL3URLDYNFKTZD4G7JAVCNFSM6AAAAABH4CO73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRZGE3DINZRGE . You are receiving this because you were mentioned.Message ID: @.***>

zxqfd555-pw commented 2 weeks ago

Hi Vikas!

Please note that now you can run airbyte data extraction jobs as Google Cloud Runs, which eliminates the need for DinD. Please refer to the Airbyte connector docs for the details.

vikas-velora commented 2 weeks ago

Thanks Sergey.

On Mon, 10 Jun 2024 at 6:39 PM, Sergey Kulik @.***> wrote:

Hi Vikas!

Please note that now you can run airbyte data extraction jobs as Google Cloud Runs, which eliminates the need for DinD. Please refer to the Airbyte connector docs https://pathway.com/developers/api-docs/pathway-io/airbyte for the details.

— Reply to this email directly, view it on GitHub https://github.com/pathwaycom/pathway/issues/53#issuecomment-2158328103, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHP3QYCXIDXJLEWUBWL6KILZGWQQDAVCNFSM6AAAAABH4CO73CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJYGMZDQMJQGM . You are receiving this because you were mentioned.Message ID: @.***>