Performance optimization

Pandas querying is very slow and can be easily replaced with traditional indexing. Here is the code that cause the bottleneck:

def _eval_rule_perf(self, rule, X, y):
      detected_index = list(X.query(rule).index)

Profiling results:

1141.451 _eval_rule_perf  skrules/skope_rules.py:614
         └─ 1140.967 query  pandas/core/frame.py:3316

An example of improved version:

tmp = X
for part_rule in rule.split('and '):
    part_rule = part_rule.strip()
    sign = '==' if '>' in part_rule else '!='
    tmp = tmp[tmp[part_rule.split()[0]] == 1 if sign == '==' else tmp[part_rule.split()[0]] != 1]

Note, this is the code for a binary case, it should be changed to a more generic version.

Profiling results

 8.658 <listcomp>  skrules/skope_rules.py:357
         └─ 8.609 _eval_rule_perf  skrules/skope_rules.py:614
            └─ 6.739 __getitem__  pandas/core/frame.py:2987

scikit-learn-contrib / skope-rules

Performance optimization #52

Profiling results:

Profiling results