isaacg1 / pyth

Pyth, an extremely concise language. Try it here:
https://pyth.herokuapp.com/
MIT License
263 stars 57 forks source link

Regex substitution fails online. #153

Closed jakobkogler closed 8 years ago

jakobkogler commented 9 years ago

:"aa1b2c3den4""(\w)\d""\\1" doesn't run in the online compiler. Offline works.

isaacg1 commented 8 years ago

The issue seems to be associated with any use of \ in a regex substitution in safe mode. The errors can be reproduced offline with the -s flag.

isaacg1 commented 8 years ago

Not worth fixing, but I would accept a pull request that fixed this.

randomdude999 commented 5 years ago

Did a bit of digging. Looks like the C version of re (_sre.c in CPython) tries to import the Python version of re due to needing to compile the regex replacement string (relevant source). That import happening seems inevitable (re.sub always calls _sre_SRE_Pattern_sub_impl, which always calls pattern_subx, which tries to import re if ever the replacement string contains a single \). A workaround would be to, instead of deleting __import__ entirely, replace it with something that passes through attempts to import re but blocks everything else. This workaround would make it easy to allow passing through imports of other modules, should similar issues arise in the future (and there are 45 calls to PyImport_Import or similar inside builtin C modules (including but not limited to datetime, decimal, json and operator), so I wouldn't be surprised if these issues do reappear).

...Actually, if the replacement is a function, it will not be compiled (obviously), so another workaround would be to make sure re.sub is always called with a function as the replacement, but that would mean you'd need to parse the replacement string yourself, and i'm not sure that will be less work than the other workaround. Though the fact that _sre.c calls back into the Python module to compile the regex means that you could just call the same method that _sre.c calls to parse the replacement, but then it'll be implementation dependent.

All in all, I think the first workaround is probably the best here. Shall I make a pull request for it?