python / cpython

The Python programming language
https://www.python.org
Other
63.5k stars 30.41k forks source link

Remove deprecated `sre_*` modules #105456

Open sobolevn opened 1 year ago

sobolevn commented 1 year ago

Feature or enhancement

sre_* modules like sre_constants, sre_compile, and sre_parse were deprecated in 3.11 in https://github.com/python/cpython/commit/1be3260a90f16aae334d993aecf7b70426f98013

Our regular deprecation policy is that the deprecated things can be removed in N + 2 release, which is 3.13.

Pitch

Let's remove them if there are no objections.

I guess it is safe to remove them for several reasons:

  1. https://bugs.python.org/issue47152 clearly states that they were undocumented
  2. There are now re._parser, re._constants, and re._compiler modules that are used instead
  3. They were listed as deprecated in the "what's new" and the warning was pretty clear

The only argument agaist removing them:

  1. The deprecation warning never says when they will be removed:
>>> import sre_compile
<stdin>:1: DeprecationWarning: module 'sre_compile' is deprecated

I will send a PR once we settle it:

@vstinner @serhiy-storchaka what's your opinion?

hugovk commented 1 year ago

Do they have much use in the top 5k PyPI packages?

For example: https://dev.to/hugovk/how-to-search-5000-python-projects-31gk

sobolevn commented 1 year ago

It is still used, yes. But, many results here are from mypy / typeshed / jedi usages, which are safe. There are some string usages, like in isort, pipreqs, ruff, and in one pytest plugin, which are also safe: for example, pytest does not recognise sre_* modules as stdlib ones.

Notice, that some results are from vendored dependencies that would be quite hard to update.

Full results of cpython/search_pypi_top.py -q . "sre_(compile|constants|parse)" > results.txt results.txt

vstinner commented 1 year ago

It's interesting that exrex already uses re._parser. This project is: "Irregular methods for regular expressions.".

./exrex-0.11.0.tar.gz: exrex-0.11.0/exrex.py: import re._parser as sre_parse
./exrex-0.11.0.tar.gz: exrex-0.11.0/exrex.py: from re import sre_parse

Another project is using the private re._constants sub-module (people love to abuse the private API instead of asking to make what they need public):

./rstr-3.2.1.tar.gz: rstr-3.2.1/rstr/xeger.py: import re._parser as sre_parse  # type: ignore[import]
./rstr-3.2.1.tar.gz: rstr-3.2.1/rstr/xeger.py: import sre_parse  # type: ignore[no-redef]
./rstr-3.2.1.tar.gz: rstr-3.2.1/rstr/xeger.py: parsed = sre_parse.parse(pattern)

Another example:

./catboost-1.2.tar.gz: catboost-1.2/catboost_all_src/contrib/python/hypothesis/py3/hypothesis/strategies/_internal/regex.py: import re._parser as sre_parse

pyparsing was using sre_constants, it's no longer the case:

Version 3.0.8 - April, 2022
---------------------------
(...)
- Removed imports of deprecated `sre_constants` module for catching
  exceptions when compiling regular expressions. PR submitted by
  Serhiy Storchaka, thank you.

There are vendored copies of pyparsing which still use sre_constants.

coverage test suite explicitly ignores the deprecation, tests/conftest.py:

    warnings.filterwarnings(
        "ignore",
        category=DeprecationWarning,
        message=r"module 'sre_constants' is deprecated",
    )

Similar example:

./pydantic_factories-1.17.3.tar.gz: pydantic_factories-1.17.3/CHANGELOG.md: - fix deprecation warning for `sre_parse` on Python 3.11.
./pydantic_factories-1.17.3.tar.gz: pydantic_factories-1.17.3/pydantic_factories/value_generators/regex.py: from sre_parse import SubPattern, parse  # pylint: disable=deprecated-module

or:

./trio-0.22.0.tar.gz: trio-0.22.0/trio/tests/test_exports.py: "ignore:module 'sre_constants' is deprecated:DeprecationWarning",
vstinner commented 1 year ago

I would prefer that people don't use private APIs. But well, as soon as it's technically possible, people just continue to do that.

Do they have much use in the top 5k PyPI packages?

At the least, it's unclear to me if it's ok or not to remove these modules right now. I would say that it's ok since we respected PEP 387 deprecation period. But @serhiy-storchaka may have a different opinion.

serhiy-storchaka commented 1 year ago

Initially I was going to not keep old modules. But it was shown that they are used in some third-party projects, so it would be nicer from us to keep them for a time. Now we can remove them at any time. But it would be even more nicer if we first add public API for things used in third-party code. It is not always feasible, for example re._constants is inherently implementation detail. But I was going to add public API for compiling replacement strings.