Open iharshulhan opened 3 years ago
Pandas querying is very slow and can be easily replaced with traditional indexing. Here is the code that cause the bottleneck:
def _eval_rule_perf(self, rule, X, y): detected_index = list(X.query(rule).index)
1141.451 _eval_rule_perf skrules/skope_rules.py:614 └─ 1140.967 query pandas/core/frame.py:3316
An example of improved version:
tmp = X for part_rule in rule.split('and '): part_rule = part_rule.strip() sign = '==' if '>' in part_rule else '!=' tmp = tmp[tmp[part_rule.split()[0]] == 1 if sign == '==' else tmp[part_rule.split()[0]] != 1]
Note, this is the code for a binary case, it should be changed to a more generic version.
8.658 <listcomp> skrules/skope_rules.py:357 └─ 8.609 _eval_rule_perf skrules/skope_rules.py:614 └─ 6.739 __getitem__ pandas/core/frame.py:2987
Pandas querying is very slow and can be easily replaced with traditional indexing. Here is the code that cause the bottleneck:
Profiling results:
An example of improved version:
Note, this is the code for a binary case, it should be changed to a more generic version.
Profiling results