Closed matzemathics closed 3 months ago
Don't we already handle this case in the filter operation? https://github.com/knowsys/nemo/blob/ac0cf3518bba990f1e69b60d167a5b40580791c8/nemo-physical/src/tabular/operations/filter.rs#L102
Possibly already handled (see above)
I did benchmark this, and the results speak for themselves:
@prefix dev: <file:///dev/>.
@import works :- json {resource="works.json"}.
items(?i, ?author_name) :-
works(_, "items", ?a), works(?a, ?i, ?x),
works(?x, "title", ?title_array),
works(?title_array, 0, ?title_id),
works(?title_id, value, ?title),
works(?x, "author", ?author_array),
works(?author_array, 0, ?author_id),
works(?author_id, "family", ?author),
works(?author, value, ?author_name).
@export items :- csv { resource="" }.
Benchmark 1: ./nmo-main authors.rls
Time (mean ± σ): 13.659 s ± 2.245 s [User: 13.643 s, System: 0.012 s]
Range (min … max): 10.542 s … 16.577 s 10 runs
Benchmark 2: ./nmo-const-join authors.rls
Time (mean ± σ): 3.997 s ± 0.599 s [User: 3.985 s, System: 0.011 s]
Range (min … max): 2.831 s … 4.370 s 10 runs
The point is that filtering is done too late in the pipeline, so this optimisation in the filter-code only helps for computed variables, but is still inefficient for variables that can be joined on.
This should greatly improve performance on somewhat sizeable tables, because of increased join selectivity.