Open jaystarshot opened 1 month ago
I'm not aware of anyone planning to do this but it seems like an interesting project.
I see what about arrow c++? If arrow c++ is already supported wrapping to velox formats shouldn't be that difficuilt
We integrate with PyArrow (which is based on Arrow C++) via the Arrow C Data Interface. So same could be used with Arrow C++ / Velox.
Yes, I expect the tricky part would not be conversion of the data (since Velox and ourselves both speak the C data interface) but just building a C++ Velox plugin and aligning the various scan methods. Unfortunately, the last I heard, Velox had planned on dropping Substrait support and so the plugin may also need custom logic to convert from Velox expressions to Substrait expressions if they wanted to support pushdown filter. Although, since the linked issue is still open, it seems support hasn't been removed yet.
Not sure if filter pushdown into scans is a concern for ML use cases, https://www.youtube.com/watch?v=bISBNVtXZ6M for example mentions that nimble doesn't yet have filter pushdowns
That's a good point. Filter pushdown is most effective with clustered indices and that hasn't yet been a major use case for us either.
Velox framework for vectorized processing - https://github.com/facebookincubator/velox