trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.83k stars 2.85k forks source link

Subquery cache roadmap #22114

Open sopel39 opened 1 month ago

sopel39 commented 1 month ago

https://github.com/trinodb/trino/pull/21888 introduces subquery cache feature to Trino engine. However, there are many follow up items to improve the performance and hit rate.

Here are the roadmap items:

### Tasks
- [ ] https://github.com/trinodb/trino/issues/22116
- [ ] https://github.com/trinodb/trino/issues/22117
- [ ] https://github.com/trinodb/trino/issues/22118
- [ ] https://github.com/trinodb/trino/issues/22119
- [ ] https://github.com/trinodb/trino/issues/22120
- [ ] https://github.com/trinodb/trino/issues/22121
- [ ] https://github.com/trinodb/trino/issues/22122
- [ ] https://github.com/trinodb/trino/issues/22165
- [ ] https://github.com/trinodb/trino/issues/22167
- [ ] https://github.com/trinodb/trino/issues/22168
chenjian2664 commented 1 month ago

@sopel39 I am interested in contributing to this topic, but I am not very familiar with the tasks mentioned above. Could you guide me on how to start and suggest which task or tasks might be suitable to start

osscm commented 1 month ago

Thanks a lot @sopel39 !

We had discussed a couple of times on the old issue about the implementation and approach.

As discussed we were also looking into the same problem, and would be more than happy to contribute. Please share your thoughts, where do you think I can work on.

May be,

https://github.com/trinodb/trino/issues/22116 Or https://github.com/trinodb/trino/issues/22165 Or

Whatsoever task you think.

Thanks.

sopel39 commented 1 month ago

Hi @osscm

Take into consideration that it will probably take some time to land this PR. I'm extracting smaller PRs atm. However, I think we should also progress with the improvements.

https://github.com/trinodb/trino/issues/22116 and https://github.com/trinodb/trino/issues/22165 are both important. Especially https://github.com/trinodb/trino/issues/22116 will improve cache hit rate for string partition types, which are fairy common.

However, I would start with something simpler like https://github.com/trinodb/trino/issues/22120 to familiarize with the concepts (code can still change in the process of review)

I would start with something simpler like https://github.com/trinodb/trino/issues/22121 to familiarize with the concepts (code can still change in the process of review)

sug-ghosh commented 1 month ago

Hi @sopel39

I am interested to contribute to this issue. Going through the code and implementation in understanding the issue. Can you please guide me which sub-task I can take up and able to contribute.

sopel39 commented 3 weeks ago

@sug-ghosh I think we would need to sync. Ping me on slack please

hackeryang commented 1 week ago

Relevant PR: https://github.com/trinodb/trino/pull/21888