tremor-rs / tremor-runtime

Main Tremor Project Rust Codebase
https://www.tremor.rs
Apache License 2.0
866 stars 126 forks source link

gRPC support #790

Open Licenser opened 3 years ago

Licenser commented 3 years ago

Describe the problem you are trying to solve

Many cloud-native initiatives ranging from Kubernetes, open telemetry, Istio, Thanos, and Prometheus and many services and APIs that form part of the GCP suite of APIs are normatively specified in gRPC and Protocol Buffers with backward compatibility via JSON and HTTP/1.1.

Tremor has first-grade support for JSON in its type system, in an environment where protocol buffers are ever-present and growing in popularity, as such extending its type system to natively support protocol buffers and other ordinal type systems and serialization formats is a natural evolution to tremor to reduce the burden on tremor application developers and operators.

Describe the solution you'd like

gRPC is a modern protocol that allows remote calls and interactions over the network. It is a de facto standard in the cloud-native world where it has in many cases displaced REST as the way to interact with a services API.

gRPC uses HTTP/2 as a protocol to enable multiplexing, asynchronous communications, and long-lasting connections between client and server.

gRPC also uses Protocol Buffers as an encoding format, this reduces both wire size, encoding, and decoding times by using a more optimized binary format.

From this there are a few requirements:

Tremor needs to be able to encode and decode Protocol Buffer based on a given .proto specification To be used as a client, tremor needs to have a http/2 client connector To be used as a server, tremor needs to have a http/2 server connector

Notes

Suggested initial target:

Enhancements to initial target:

NamikoToriyama commented 3 years ago

@Licenser Hi! I'm interested in working on this project as a part of GSoC.

I have some questions about this issue.

HTTP/2 linked source (server) HTTP/2 linked sink (client)

According to this link, we plan to use rust for coding. Is it correct to implement HTTP/2 server with rust? And is it correct to think that the http2 server here is like a gRPC server?

Finally, I'd appreciate it if you can point me to where I can start to get a better understanding of how things work.

wy2249 commented 3 years ago

Hey, @Licenser

I am also interested in this project! Recently I am working as an sde intern and write gRPC and proto buff in golang for communication between microservices. Excited to try it out in Rust!

With respect to the target for code generation, I tried to search with keywords like codecs and PDK, but did not get much info to help me really understand how it works. May I ask what PDK and WASM abbreviate for? Sincerely appreciate if we can have more info about code generation.

Licenser commented 3 years ago

Welcome @NamikoToriyama, welcome @wy2249,

first of all, thank you both so much for your interest in the project :)!

Let me try to answer the questions.

According to this link, we plan to use rust for coding. Is it correct to implement an HTTP/2 server with rust?

Yes, the implementation will be in Rust. Tremor uses a model where we have different connector sources (or onramps) that allow data to get into the system, and sinks (or offramps) that allow data to be sent out of the system.

The term linked in this context denotes a source or sink that works bidirectional. Taking the HTTP/2 server as an example, a HTTP/2 request in itself is a data ingest event (so this behaves as a source) but the request expects a reply, that reply is a data egress event (so it also behaves like a sink). Since those two events are related, or linked, we refer to those sources and sinks as linked sources and sinks.

And is it correct to think that the http2 server here is like a gRPC server?

Yes and no. One of the concepts of tremor is that we try, as much as possible, to build components reusable, so for supporting gRPC the goal would be to support the components of gRPC individually. An HTTP/2 server source is part of a gRPC server, but it has uses in other places. So the goal would be to implement an HTTP/2 server on which a gRPC server can be built.

May I ask what PDK and WASM abbreviate for?

I'm really sorry we forgot to spell those out, I'll fix that in the issue too, thanks for pointing it out!

PDK stands for Plugin Development Kit, a way to dynamically load extensions into tremor. That way it would be possible to load sources, sinks, codecs or other supporting mechanisms into tremor without recompiling it, making the life of operators easier.

We have a related issue for the GSoC about this but the two tasks have the possibility, but not the requirement, to be related.

Wasm stands for Web Assembly it originally was developed to improve the performance of runtimes in browsers but has since then become an interesting target runtime to execute untrusted code inside of applications. Along with plugins, this would be a way to load new logic into tremor without completely recompiling it.

One of the challenges with gRPC is that protobufs need to be compiled which makes it non-trivial for dynamic systems like tremor to support them. The ability to load them without re-compiling the entire runtime would make the usability of gRPC a lot better, ideally, we don't want to force users of tremor to learn rust just to use it for gRPC so dynamically loading pars of that would be nice, both Wasm and the PDK would give different possible approaches to this.

Sincerely appreciate if we can have more info about code generation.

Absolutely! Since tremor has an internal data representation when handling gRPC we will need to map the protobuf spec for the gRPC service to tremors representation and map the response of that back to the protobuf encoding. This can be done manually but that is a lot of work for users and operators to undertake to implement a gRPC service, especially if a machine-readable format like protobufs already exists.

Code generation could help with this in a few forms, depending on how other parts are approached. It would help to generate codecs for different protobuf files. It could also generate plugins for the PDK. Or provide code that can then be compiled to Wasm. It also could generate functions for tremor script to make it easier to access and mutate the data of a given protobuf file.

--

I hope that answers the questions, if not feel free to reach out!

Let me use the chance to invite you to our community discord, you're more than welcome to ask questions there too, and to get to know the community a bit as well.

NamikoToriyama commented 3 years ago

Hi! @Licenser

I realized that I don't understand a lot of things yet to write proposal. I ask some questions.

Q1: Is this the document that talks about JSON and HTTP/1.1 in Tremor? Am I correct in creating a gRPC endpoint for this?

And about HTTP/2 linked source (server) and HTTP/2 linked sink (client). I realized that I've created like the HTTP/2 server as an intern, but I don't have much experience with the client one. Is client similar to GET in REST API?

Q2: Istio is listed in Protobuf codecs for at least 3 gRPC services. Do we need to build a k8s environment or do we already have a k8s environment? I understand that there is a Docker environment.

Q3: About Enhancements to initial target. Usually, when a new Protocol Buffer value is added, the pb file is regenerated using commands such as protoc. Here, do you mean to dynamically recompile the proto file by using wasm? I don't know if my understanding is correct because it was too cool.

I may not have understood the answer even if it has already been answered. Thank you.

Licenser commented 3 years ago

Hi @NamikoToriyama, thanks for the interest :D! Let me go through the questions, and please feel free to ask for clarification if I don't manage to answer your questions fully.

A1

Not entirely, we're not so much concerned about tremor's own API turning that into a gRPC endpoint would be a fairly boring task too and we really don't want to bore a mentee :).

Tremor is an event processing system that receives and sends data to other systems. What we're aiming for is allowing these other systems to reach tremor using gRPC.

Let's take Prometheus as an example, we'd want to be able to scrap a Prometheus endpoint (act as a client/onramp/source) and be scrapped by Prometheus (act as a server/offramp/sink). Just that the goal is not to implement a specific protocol like Prometheus but provide a facility that any gRPC specification can be plugged into Tremor.

The naming of client/server/sink/source is a bit confusing, my apologies, so let me try to break this down.

Tremor has two concepts in this regard:

the notion 'linked' here means that we allow the system to react to the response, for example looking at rest, the linked rest sink is a sink that not only can make an HTTP call (this is where the data leaves the system) but also provide the corresponding response to the system (this is kind of where data enters the system again, just new data), so we call it linked as it in a way behaves as both a sink and (a linked) source.

The notion of server and client is more specific to the protocol, here HTTP/2, where an HTTP/2 client is a source or sink that initiates a request, where the HTTP/2 server would be the one that receives requests.

Those are two different facets of the same thing. An HTTP/2 client can exist as both a source and a sink, for example for Prometheus, which follows a pull model, it would be a source, for say google cloud storage (something @jigyasak05 is working on right now) it would be a sink.

A2

We currently don't have a k8s environment set up, no, that said the three services mentioned in there are examples, not requirements, any 3 services work. The goal of that criteria is to provide some (at least three) working examples of the gRPC connectors so there is something for people to get started too. It would be perfectly fine to pick any three other gRPC services.

A3

For example yes, so we see a few possibilities in achieving this, but ultimately we will want the mentee to pick what they think is the right approach in their proposal.

There are two aspects to this, one is an easy upgrade path if a protocol buffer changes. The other is the ease of adding new endpoints.

This comes back to A1/A2, the goal is not to implement a specific gRPC endpoint, the goal is to add a facility that allows users of tremor to quickly and easily add any gRPC endpoint they want as a source or sink. So a facility to easily create the needed code, and ideally load it without the requirement of recompiling the entire tremor binary (even so that is a valid first step/solution and would be a success in the sense of the project) is the ideal outcome here. Wasm and Plugins (see the other GSoC project) are two options we came up with for this, but there might be alternatives we've not thought about.

NamikoToriyama commented 3 years ago

@Licenser I get understand Tremor more for your explanation:) Let me ask you a few more questions.

Q1

I guess you want to make the source (server, onlamp) and the sink (client, offlamp) gRPC compatible. So, code generation is needed to automatically generate gRPC for sink, and something like SDK or Wasm is used to automatically add fields to gRPC when the source is changed, right?

Q2

I was confused about Protobuf codecs for at least 3 gRPC services. because of Istio. However, since OpenTelemetry is a data analysis tool and Prometheus is a metrics visualization tool, does this mean that we can create an example that connects these services using the tremor gRPC service?

Thanks for answering.

Licenser commented 3 years ago

Absolutely, glad it helped :D

A1

Basically yes, we want to make a source that can support different proto definitions (or add a way to easily make new sources from definitions). We got a number of different sources already (we still call them onramps in the docs) which are all artisanal handcrafted sources. The goal of the GSoC project is to make that easier for gRPC based services to the degree that in most cases a user just needs to provide a .proto file and then magic happens :).

SDK and/or Wasm are two of the more advanced ways to handle this we thought off, both would make it easier for users to not have to compile everything into a huge executable.

A2

Yes! The goal of that part of the task is simply to have a little demo of how nice and easy it is to create sources and sinks from gRPC specs. And act as a walkthrough, documentation for others to create sinks/sources from gRPC specs.

The three examples are just some suggestions, you're totally free to pick your favorite gRPC based services here, it's very low pressure. The three were just suggested based on 'the first thing that came to mind' basis. But you're very right, the original text read as if those specific three were expected, thank you for calling that out! It wasn't the intention and we updated the ticket to reflect it :)