Open backkem opened 3 years ago
@backkem hi, do you mean that TiDB supports Apache Arrow Flight as a new client to replace the MySQL protocol?
And do you have any interest to develop this feature with us?
Hi @winkyao. Yes, but I would add it in addition to the MySQL protocol. The MySQL protocol has broad support in existing tooling, therefore I would not phase it out. However, if you're writing a new service and want increased performance, or any of the other benefits, you could switch over to the Apache Arrow Flight protocol. The Flight connector could also be more 'native' if the data is stored in the Apache Arrow format. I read it may already closely resemble it.
I'd love to help you build this but I'll have to find the time. Especially, to land it on my own. It may also be a good idea to reach out to the Arrow community as well, E.g.: I'm not sure if they have a standard for querying yet.
@backkem Thanks for your suggestion. We will try to reach out to the Arrow community and find a way to cooperate with them.
@zz-jason Could you please take a look at these designs?
@backkem Thank you for your suggestion!
In TiDB:
Chunk
, which is a columnar data structure like RecordBatch in Apache Arrow, to store data for vectorized query execution in the TiDB SQL layer. TiKV Coprocessor also utilizes a similar data structure and vectorized execution scheme.After reading the blog about Arrow Flight and the proposal about Arrow Flight SQL, maybe we could:
From the Implementation Status in Apache Arrow, seems we need to support Flight RPC for go firstly.
The first implementations of Flight SQL are shipping in Arrow 7.0.0: article. This doesn't include a Go port yet thought.
Linking the Go Flight SQL package and server implementation example.
One major difference between the current MySQL connector and Arrow Flight is that the former is connection based and the latter uses a more stateless request/response design (gRPC).
Looking at the code, this may mean it makes more sense not the use the current session
implementation and create a separate implementation that uses the Parser
/ Compiler
/ Executor
directly. That being said, the RecordSet
is already closely inspired by Apache Arrow.
Looking into it more, both the Compiler
and Executor
have a significant dependency on the sessionctx
. This would either have to be unraveled or an ephemeral session could be created.
I created a very basic POC for this. You can find the code here:
Feature Request
Describe the feature you'd like: Hi all, I'm wondering if there would be interest in supporting an Apache Arrow Flight connector. This transport can enable faster data retrieval and higher throughput by reducing (de)serialization and data copying. It also gives you nice strict typing on query results. Further down the line it may also allow distributing of query processing without relaying the query results through a coordinator node.
Describe alternatives you've considered: None
Teachability, Documentation, Adoption, Migration Strategy: Apache Arrow Flight is a well documented protocol with implementations across many languages. Naturally, this would be an additive feature, next to the existing MySQL/ODBC Connector.