sloria / TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
https://textblob.readthedocs.io/
MIT License
9.02k stars 1.13k forks source link

fix SyntaxWarning problem new in 3.12 #418

Closed smontanaro closed 4 months ago

smontanaro commented 1 year ago

I've been messing with the nogil version of 3.12 alpha. It begins raising SyntaxWarning for at least some strings where a backslash would be meaningless. textblob._text has a couple regular expressions containing \. which aren't in raw strings. This PR solves that problem in one way, by replacing \. with [.]. An alternative way to do this would be to convert to raw strings. I find [...] more readable than using a backslash. YMMV.

matteospanio commented 11 months ago

Seeing this error also running python 3.11.4

From the re module documentation:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a SyntaxWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

So I would suggest to use raw strings with backslashes, as indicated by the python standard library.

Anyway, since it is an annoying error, I think it should be updated ASAP. @sloria

smontanaro commented 4 months ago

Thanks. I guess I'll just use my fork.

sloria commented 4 months ago

Sorry for the long delay. My time to work on textblob has been very limited these past few years. And it's taken a while to get to even small PRs like this due to all the yak-shaving necessary to merge them (migrating off of travis.yml, supporting modern Python, etc). I finally got to finishing these chores in #426 , which includes addressing the SyntaxWarning by using raw strings as suggested by @sterliakov .

Thank you both for your work on this!

sterliakov commented 4 months ago

Wow. @sloria thank you for your work - that PR looks like a solid effort!