starlake-ai / starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
http://starlake.ai/
Apache License 2.0
43 stars 20 forks source link

[FEATURE] - Add GCP PubSub datasource/sink to read/write data (the way the Apache Kafka datasource is already handled) #961

Open jwazne opened 1 month ago

jwazne commented 1 month ago

Description

More and more projects for the BPCE customer need to consume/write messages from/to GCP PubSub. Currently there is already two different projects that need to consume messages from GCP PubSub to send it to a HTTP endpoint or a Kafka topic.

Solution proposition

Starlake already provides an easy way to handle messages coming from Kafka and to send the messages to an HTTP endpoint. What could be done is to duplicate the Kafka implementation into a new implementation for GCP PubSub.

The existing

(Consumer) KAFKA <--> Starlake <--> HTTP ENDPOINT / KAFKA (Writer)

The wish

(Consumer) GCP PubSub <--> Starlake <--> HTTP ENDPOINT / KAFKA / GCP PubSub (Writer)

Considered alternatives

Without the native Starlake consumer/writer from GCP PubSub, we are currently using the following to fulfill our projects need : (Consumer) GCP PubSub <--> Custom Scala code application <--> HTTP ENDPOINT (Writer) <--> Starlake based application <--> KAFKA (Writer) The consequence is to have two projects to handle the consumption of GCP PubSub messages and the writing to Kafka.

Additional context

Scala source code to consume messages from GCP PubSub and to send it to HTTP endpoint is available to share upon demand. Scala source code to listen HTTP endpoint and to send messages to Kafka is available to share upon demand.

Life cycle

Possible implementation

I can provide any more context, source code and support useful to develop the feature.

Complexity estimation

T-Shirt Size: M

hayssams commented 1 month ago

Thank you for this great suggestion

Will draft a design document and submit it to you then we can work together on an implementation