pydata / numexpr

Fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more
https://numexpr.readthedocs.io/en/latest/user_guide.html
MIT License
2.23k stars 210 forks source link

[BUG]: Sanitizing regex does not exclude string literals #468

Closed taldcroft closed 9 months ago

taldcroft commented 10 months ago

4b2d89cf introduces a regression when an expression includes a string literal with any of the new forbidden characters. This is breaking our production code when we upgrade numexpr to 2.8.7.

Example:

>>> import numexpr as ne
>>> ne.__version__
'2.8.7'
>>> import numpy as np

>>> x = np.array(['a', 'b'], dtype=bytes)
>>> ne.evaluate("x == 'b'")
array([False,  True])

>>> ne.evaluate("x == 'b:'")
Traceback (most recent call last):
  Cell In[6], line 1
    ne.evaluate("x == 'b:'")
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:975 in evaluate
    raise e
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:872 in validate
    _names_cache[expr_key] = getExprNames(ex, context, sanitize=sanitize)
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:721 in getExprNames
    ex = stringToExpression(text, {}, context, sanitize)
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:281 in stringToExpression
    raise ValueError(f'Expression {s} has forbidden control characters.')
ValueError: Expression x == 'b:' has forbidden control characters.
27rabbitlt commented 9 months ago

This could be fixed by firstly replacing content within quotes before trying to match blacked list. I will fix this and add some tests.

taldcroft commented 9 months ago

Thanks, looking forward to the next release! Looks like this can be closed now?

27rabbitlt commented 9 months ago

Yes ^-^

On Wed, Jan 24, 2024 at 11:09 Tom Aldcroft @.***> wrote:

Thanks, looking forward to the next release! Looks like this can be closed now?

— Reply to this email directly, view it on GitHub https://github.com/pydata/numexpr/issues/468#issuecomment-1907811565, or unsubscribe https://github.com/notifications/unsubscribe-auth/A33BDH3LGQ6RIORE2VXP6ALYQDMWFAVCNFSM6AAAAABBI5QLPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXHAYTCNJWGU . You are receiving this because you commented.Message ID: @.***>