utdemir / distributed-dataset

A distributed data processing framework in Haskell.
BSD 3-Clause "New" or "Revised" License
114 stars 5 forks source link

Google Cloud Functions backend #21

Open yaman opened 4 years ago

yaman commented 4 years ago

Would it be easy to build a Google Cloud Functions backend(not looked at the code how you have done it for aws lambda though)?

utdemir commented 4 years ago

Yes, it shouldn't be that hard; we should just be able to take distribute-dataset-aws-lambda and rewrite API calls for GCP.

Probably the hardest thing would be to define the infrastructure to run the functions (defining functions themselves, uploading the code, fetching the results etc.).

Are you keen to work on this, or if this is a feature request? I would be happy to help as much as I can if you are keen to implement it.

yaman commented 4 years ago

I am keen to working on this though I consider myself a newbie when it comes to Haskell(I have couple of web services running in production, still trivial comparing to what you have implemented here). I can't promise I will not be lost during the process.

I have been checking serverless runtimes available for haskell, aws-lambda with it is being open to implementing custom runtimes, puts it to the edge of the competition. But we are at Google Cloud / Google Kubernetes...

As you said, it would be easier to handle API calls to GCP than defining infrastructure to run the functions. And I can start at least working on API calls. But I get excited when I think of the challenges about infrastructure to run the functions on google functions.

So, I can pretty much handle API calls... My question is, do you have any initial thoughts on how to run functions on Google Functions considering GF is only supporting Node(v8-v10), Python(3), Golang?

utdemir commented 4 years ago

I'm glad to hear that!

Actually, at the time I was implementing distributed-dataset-aws-lambda, AWS Lambda wasn't supporting custom runtimes; so instead I used the Python runtime and used a small wrapper written in Python to invoke the haskell binary. You can see this here.

I am assuming we could do a similar trick with GCP; we can write a small Python or JS wrapper and package it together with the Haskell executable.

There is nothing much to do other than API calls, if you just figure out how to create a Cloud Function from scratch, invoke it and return the result to the Haskell program, making distributed-dataset use it would be trivial.

yaman commented 4 years ago

Looking at the Archive.hs, I need a little bit more intuition about how to do serverless at AWS Lambda to understand the concepts. I will acquire it in no time and test the same method with something pretty trivial.

Meantime, I started to look at GCP Functions Api. Creating a function through api(not just api, but also Google Functions UI) needs the source code as archive(or source code itself if creating the function from UI). I am not sure this is how it works at AWS Lambda, but what GCP does right now is, it grasps the source code and compiles itself with the selected runtime(Node8/10, python, golang).

I am checking AWS Lambda right now if I can catch another way to do this at GCP.

utdemir commented 4 years ago

Good luck :). We're doing a similar thing archiving the code when sending it to AWS Lambda. You will see it once you start to dig in to the code.

In case you have any questions, I created a Gitter channel to chat. You should be invited. Feel free to ask if something is blocking you.

yaman commented 4 years ago

Great. I wrote a pretty long "hello" to gitter, lets continue discussing details there and keep here for technical details we agreed on.