Open ARF1 opened 9 years ago
Hi @ARF1
Sorry no one ever got back to you before! :( We used to work like the inoperatortransformer before, but with larger in statements it broke numexpr (too many or's); so we had to implement this workaround. In a short mail discussion with Francesc Alted he suggested that the best thing to do was to add in/not in functionality to numexpr. But that needs some heavy C coding (not my personal forte and my programmers are also quite overloaded atm). Still, it's on the to do list as it will greatly improve filtering (you would be able to push everything directly to numexpr) The factorization part is a very good idea, i'll see how to automate that from a filter behaviour
Based on visualfabriq/bquery#27 but can be rebased on master.
This introduces the infrastructure for plug-in query transformers. Included are three sample query transformers:
InOperatorTransformer
:my_col in ['ABC', 'DEF']
is transformed into(my_col == 'ABC') | (my_col == 'DEF')
. The operationnot in
is similarly transformed.TrivialBooleanExpressionsOptimizer
:(my_col == 'ABC') | (False)
is transformed intoFalse
(limited usefulness without an intelligent query optimizer)CachedFactorOptimizer
: converts comparisons containing columns with cached factors into comparisons using the factor instead. (Naive implementation, currently only useful for edge-cases.)By default this PR does not change the behaviour or dependencies of bquery. Query transformers have to be explicitly enabled by configuring them, e.g.:
For convenience, a shortcut is provided for these (currently) most useful transformers with
transformers.standard_transformers
:The overhead for queries is negligible for reasonably sized databases: For the query
db["my_col=='AB1234567890'"]
bquery without query transformers requires 362 ms, with all query transformers configured (includingCachedFactorOptimizer
) 367 ms.With a non-compressed database the
CachedFactorOptimizer
shows some minor positive effects: 547 ms vs. 296 ms