pydata / numexpr

Fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more
https://numexpr.readthedocs.io/en/latest/user_guide.html
MIT License
2.25k stars 212 forks source link

[BUG]: Sanitizing regex does not exclude string literals #468

Closed taldcroft closed 10 months ago

taldcroft commented 11 months ago

4b2d89cf introduces a regression when an expression includes a string literal with any of the new forbidden characters. This is breaking our production code when we upgrade numexpr to 2.8.7.

Example:

>>> import numexpr as ne
>>> ne.__version__
'2.8.7'
>>> import numpy as np

>>> x = np.array(['a', 'b'], dtype=bytes)
>>> ne.evaluate("x == 'b'")
array([False,  True])

>>> ne.evaluate("x == 'b:'")
Traceback (most recent call last):
  Cell In[6], line 1
    ne.evaluate("x == 'b:'")
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:975 in evaluate
    raise e
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:872 in validate
    _names_cache[expr_key] = getExprNames(ex, context, sanitize=sanitize)
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:721 in getExprNames
    ex = stringToExpression(text, {}, context, sanitize)
  File ~/miniconda3/envs/numexpr/lib/python3.10/site-packages/numexpr/necompiler.py:281 in stringToExpression
    raise ValueError(f'Expression {s} has forbidden control characters.')
ValueError: Expression x == 'b:' has forbidden control characters.
27rabbitlt commented 10 months ago

This could be fixed by firstly replacing content within quotes before trying to match blacked list. I will fix this and add some tests.

taldcroft commented 10 months ago

Thanks, looking forward to the next release! Looks like this can be closed now?

27rabbitlt commented 10 months ago

Yes ^-^

On Wed, Jan 24, 2024 at 11:09 Tom Aldcroft @.***> wrote:

Thanks, looking forward to the next release! Looks like this can be closed now?

— Reply to this email directly, view it on GitHub https://github.com/pydata/numexpr/issues/468#issuecomment-1907811565, or unsubscribe https://github.com/notifications/unsubscribe-auth/A33BDH3LGQ6RIORE2VXP6ALYQDMWFAVCNFSM6AAAAABBI5QLPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXHAYTCNJWGU . You are receiving this because you commented.Message ID: @.***>