trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.82k stars 2.84k forks source link

Project Swift #22271

Open wendigo opened 3 weeks ago

wendigo commented 3 weeks ago

Trino has had its protocol since it's inception in 2012. Both client and cluster protocols are REST-oriented and are using JSON as only serialization format and HTTP/1.1 as a transport layer. While in 2012 the client and server protocols were good enough for the majority of use-cases, nowadays the amount of data clients want to efficiently retrieve from the Trino cluster has increased significantly.

We are starting project Swift with the goal of improving existing Trino protocol, both for client and server to server communication.

Introduction of v2 protocol isn't the goal for this project.

Tasks

### Client protocol improvements
- [ ] #21793
- [ ] #22227
### Server protocol improvements
- [ ] #21793
- [ ] https://github.com/trinodb/trino/pull/22249
- [ ] https://github.com/trinodb/trino/issues/6552
- [ ] https://github.com/airlift/airlift/pull/1183
- [ ] https://github.com/airlift/airlift/pull/1158
- [ ] https://github.com/airlift/airlift/pull/1161
- [ ] https://github.com/trinodb/trino/pull/22457
sajjoseph commented 3 weeks ago

Wonderful. Thanks for this initiative even though I will be thrilled if we ever see green light for V2 protocol. How about the following.

  1. Add nextURI to the HTTP header response
  2. Add partialCancelUri to the HTTP header response
  3. targetResultSize enhancement
  4. Add cluster identifier as a request parameter
wendigo commented 3 weeks ago

@sajjoseph can you elaborate more on the use-cases for each of the points?

himanshpal commented 3 weeks ago

In a world where Arrow and Arrow Flight are the new standards and being increasingly adopted by many databases, Do we ever plan to invest in Arrow and integrate in Trino ?

I know, couple of years ago Netflix team did a poc for integrating Arrow in trino but it never got completed.

wendigo commented 2 weeks ago

@himanshpal we are not considering introduction of an entirely new protocol at the moment (like Arrow Flight). We are thinking about other serialization format for the client-server communication and Arrow is one of the candidates.

losipiuk commented 2 weeks ago

cc: @losipiuk

mosabua commented 1 week ago

@himanshpal just to clarify what @wendigo mentioned.. we are considering Arrow as one of the candidates but in its current format it has significant limitations in its type system so that it can not be used to cover all data from Trino and its richer type system. So we might end up in a situation where Arrow can be used with limitations in place, and another format is used for full support. However .. the Arrow project is advancing and we are still quite a way from even starting on a V2 protocol. There is a lot of room to improve the current protocol and that is our focus in this Project Swift.

mosabua commented 1 week ago

@wendigo I think some of the ideas from @sajjoseph are related to Trino Gateway and other tools being able to redirect easier by just using info in the HTTP headers rather than having to parse the response. I kinda recall us talking about that in some Trino Gateway dev syncs as well so maybe @oneonestar @vishalya @willmostly @Chaho12 have a better memory than me and can detail this more.

wendigo commented 1 week ago

@mosabua I recall it.

mosabua commented 2 days ago

Some very interesting numbers from a user reported in https://github.com/trinodb/trino/issues/22303 related to changing targetResultSize .. this could be a great quick win. Maybe its worth changing the current default to more than 16MB for starters. And maybe figure out some way to adjust automatically.