Open wjones127 opened 5 months ago
I'm interesting in improve datafusion extensibility for lance
I think a good approach to this would be starting to design some logical plan nodes implementing UserDefinedLogicalNode.
From there, we can create a create_plan_v2()
method that creates a logical plan instead of the ExecutionPlan
. Later then we can physical planner and other things. This would keep the existing planning intact until we are ready to switch things over.
I'm interesting in improve datafusion extensibility for lance
Is there a particular goal you have in mind that you want to work towards?
Is there a particular goal you have in mind that you want to work towards?
None yet. I would like to know how lance built on datafusion and what areas could be improved.
I think a good approach to this would be starting to design some logical plan nodes implementing UserDefinedLogicalNode.
Probably I could start from this! Create a LogicalPlan for KNN search
@jayzhan211 If you want a smaller issue to get started with, this might be a better one: https://github.com/lancedb/lance/issues/1927 It will also have a more immediate pay off.
The code in dataset/scanner.rs has gotten extremely complicated, to a point where it is hard to test. Before we make any improvements, we need to refactor this to be easier to test and extend.
In addition, outside codebases may wish to extend Lance's capabilities by modifying or composing plans. For example, in LanceDB, we'll want to add a separate WAL that needs to be queried during KNN queries and scans.
Tasks
Scanner::create_plan()
in terms of that._rowid
and_distance
less awkward to deal with -- so thatselect([]).with_row_id()
is justselect ("_rowid")
We should reserve that column name._distance
unlessselect *
orselect _distance