jamescasbon / PyVCF

A Variant Call Format reader for Python.
http://pyvcf.readthedocs.org/en/latest/index.html
Other
402 stars 200 forks source link

Fix invalid escape sequences in regex strings #340

Open DavidCain opened 2 years ago

DavidCain commented 2 years ago

Summary

This commit fixes deprecation warnings that arise from using backslashes in strings, but not as part of an escape sequence. It will help this library be used with newer versions of Python.

String literals do not change (for current versions of Python)

>>> r'[\[\]]' == '[\[\]]'
True

Examples

$ python -Wd -c 'print("\d")'
DeprecationWarning: invalid escape sequence \d
$ python -W error -c 'print("\d")'
SyntaxError: invalid escape sequence \d

Explanation

For an explanation of the problem (and the recommended solution), see: https://docs.python.org/3/library/re.html

Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'.

How to keep these errors from source code

I didn't make any proposed changes in this commit, but there are a few ways to make sure that new invalid escape sequences are not used: