scikit-learn-contrib / skope-rules

machine learning with logical rules in Python
http://skope-rules.readthedocs.io
Other
599 stars 96 forks source link

SyntaxError: Python keyword not valid identifier in numexpr query #21

Open saurabhdaalia opened 5 years ago

saurabhdaalia commented 5 years ago

When I add feature names to the SkopeRules model, I encounter this error.

Some of the feature names are :

data__blocked_bugs_number
data__ever_affected=False
data__ever_affected=True
data__has_crash_signature=False
data__has_crash_signature=True
data__has_github_url=False
data__has_github_url=True
data__has_str=irrelevant
data__has_str=no
Traceback (most recent call last):
  File "run.py", line 55, in <module>
    model.train()
  File "C:\Users\Saurabh Daalia\Desktop\bugbug\bugbug\model.py", line 101, in train
    self.skope_clf.fit(X_train, y_train)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 350, in fit
    for r in set(rules_from_tree)]
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 350, in <listcomp>
    for r in set(rules_from_tree)]
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 600, in _eval_rule_perf
    detected_index = list(X.query(rule).index)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3088, in query
    res = self.eval(expr, **kwargs)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3203, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\eval.py", line 294, in eval
    truediv=truediv)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 749, in __init__
    self.terms = self.parse()
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 766, in parse
    return self._visitor.visit(self.expr)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 327, in visit
    raise e
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 321, in visit
    node = ast.fix_missing_locations(ast.parse(clean))
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
SyntaxError: Python keyword not valid identifier in numexpr query
ngoix commented 5 years ago

is it because you put = in your feature names?

saurabhdaalia commented 5 years ago

I see, I think that might be the issue. But what is causing this issue? Is there any workaround for it?

ngoix commented 5 years ago

the variable names are parsed to build the rules, which causes your bug. I don't see an easy workaround. You really shouldn't put = in your feature names...

marco-c commented 5 years ago

You really shouldn't put = in your feature names...

Feature names are strings, so it seems like a limitation to restrict what they can contain (everything else in the scikit-learn world doesn't care about it). Maybe it should be allowed, or at least documented somewhere?

ngoix commented 5 years ago

you are right this should be documented. Feel free to open a PR for that or for fixing the syntax error :)

ghost commented 4 years ago

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition

SyntaxError: Python keyword not valid identifier in numexpr query

Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

vedal commented 4 years ago

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition

SyntaxError: Python keyword not valid identifier in numexpr query

Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

This happened to me as well. The problem was that I kept holding down the alt-key when writing the following the pipe symbol. I encounter this frequently, as writing pipe requires me to hold alt.

osdiego commented 4 years ago

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition SyntaxError: Python keyword not valid identifier in numexpr query Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

This happened to me as well. The problem was that I kept holding down the alt-key when writing the following the pipe symbol. I encounter this frequently, as writing pipe requires me to hold alt.

Happened to me too, do anyone know how to fix?! Thanks xD

CCNOAI commented 4 years ago

@osdiego Did you copy and paste from another document. The "-3" is not being read correctly by the query function. Try removing/deleting the minus and replacing it. Let me know if this works.

osdiego commented 4 years ago

@CCNOAI I'm doing something like: (importance >= 0 | importance = -7). The question is: I need to search like that, is there no way?