Closed backkem closed 1 week ago
Hey @backkem, that's a good question.
We abide by the TableProvider
API set out by DataFusion which doesn't take into account the ORDER BY
clause:
https://github.com/splitgraph/seafowl/blob/fdd7c4996f9f06385e1d3c398f19c51929ea6c41/datafusion_remote_tables/src/provider.rs#L120-L126
Sorting itself is handled by DataFusion further down the data processing pipeline (i.e. once the data has been fetched) by a plan node above the scanning node in the plan AST.
While in principle filtering and sorting are commutative, the limit doesn't commute with sorting. DataFusion handles this by carefully deciding when to push-down the limit down into the scan (hence why it's an Option<usize>
), though I forgot where exactly that occurs.
Thank you for the feedback. I'll try to find some time to look into the directions mentioned in apache/arrow-datafusion#7871.
Closing as this was answered. FYI: We created datafusion-contrib/datafusion-federation to explore the full query federation use-case.
I was wondering: does the
datafusion_remote_tables
filter push-down not support sorting? It seems that using filters and limits in the absence of a sort order could lead to un-expected results.I'd be happy to help address this if this is indeed the case.