python / cpython

The Python programming language
https://www.python.org
Other
63.54k stars 30.44k forks source link

Add structured version info for compression modules #117404

Open serhiy-storchaka opened 7 months ago

serhiy-storchaka commented 7 months ago

Feature or enhancement

The zlib module has two attributes that denote the zlib library version:

But the version string is not comfortable to use. You need to convert it to a sequence of numbers for comparison, this is boring and errorprone. It would be better to provide a structural version, like sys.version_info.

Other compression modules do not provide versions at all. The bzlib library has function BZ2_bzlibVersion() which returns the version string. The lzma version has a number of constants and functions (LZMA_VERSION_STRING, LZMA_VERSION, lzma_version_string(), lzma_version_number()) which return static and runtime versions as a string or packed into a number.

115989 by @ivq initially included addition of lzma.LZMA_VERSION and lzma.LZMA_RUNTIME_VERSION for version strings (clearly inspired by zlib). @gpshead renamed them to lzma.LZMA_HEADER_VERSION_STRING and lzma.LZMA_VERSION_STRING and added lzma.LZMA_HEADER_VERSION and lzma.LZMA_VERSION for versions packed into a number.

I propose to add attributes for version strings and version named tuples for all compression modules. I think that named tuples are more useful than packed numbers. I created a patch for this, but I am not sure about names. Should we use "RUNTIME" or "HEADER" prefixes? Should we add "_STRING" or "_info" suffix? Do we need the version of the library that was used for building the module or the version of the library actually loaded by the interpreter is enough?

Linked PRs

gpshead commented 7 months ago

Thanks for looking at the larger picture. On that lzma PR in particular, the reason I went for "LZMA_VERSION" and "LZMA_HEADER_VERSION" is that the only one most people are going to ever find reason to use is the runtime version so it should be the shorter name. the header version generally isn't important, we expose it because we can.

Same with string versions, having a properly consumable tuple or numeric version is useful. The strings are a mere convenience. We could do without them but making them available doesn't hurt.

I also think we should reconsider the naming: don't include the name prefix on the module member name. It's in the lzma or zlib module, it doesn't need a LZMA or ZLIB or similar prefix on the name. (It should also be excluded from __all__)

For existing names that Modules already have, there is no reason to deprecate the existing ones. But adding new ones should use the brief names making the structured version info the simplest one type and get at. lzma.version for example.

Priority:

Everything else is a bonus feature.

serhiy-storchaka commented 7 months ago

Thank you for your feedback Gregory.

Some information and reasoning:

  1. In the curses module we use ncurses_version. The ncurses_ prefix is importand, because ncurses is just one of many implementations of curses. Other implementations have different versions, and AFAIK they do not provide version programmatically. This is perhaps a lesser issue for zlib, bzlib and lzma libraries, the alternative implementations, if exists, most likely provide a version compatible with the main implementation. But zlib-ng provides also its own version which can be exposed with the zlibng_ prefix.
  2. They are needed to be in __all__ to show them in the module help() output. This is yet one reason to use unique prefixes. Unless we reuse __version__ for this (which for now should be a string). But it may make more difficult for users to discover this feature.
  3. The string representation can contain additional information not exposed in the named tuple. See sys.version and sys.version_info for example. I tried to use a hybrid approach in Tkinter: info_patchlevel() returns a named tuple, but its string representation was a version string -- so you can use it in both purposes. But it was only possible because the named tuple contained all information of version string. In other cases we need to add more fields (maybe hidden).
serhiy-storchaka commented 7 months ago

The sqlite3 module exposes version, version_info, sqlite_version and sqlite_version_info. The former two are not the Sqlite version, they are the version of the pysqlite3 module when it was developed outside of the stdlib. Currently they are deprecated, because they do not contain useful information and can only cause confusion.

gpshead commented 7 months ago

98567 regarding zlib version when linked to zlib-ng is an example of why structured version info is useful. And perhaps why the string version can also be useful: it allows alternative implementations to identify themselves.

gpshead commented 7 months ago

I do not think CPython should be in the business of proactively identifying alternative implementations. There are infinite possibilities there and we'd always be chasing down ones to add specific ID logic for. We should not waste effort on that unsolvable task. ie we should never have a .zlibng_version attribute.

We do not want users to have to use hasattr to try and find attributes that may or may not exist based on some choice someone made at build time.

When alternative implementations are used, it is up to them to provide "equivalent to" version numbers. They likely already do for the purposes of being drop in replacement compatible at the C level.

gpshead commented 7 months ago

They are needed to be in all to show them in the module help() output.

I don't really care if they are in __all__ or not, so if desired, fine. Anyone using from module import * has shot themselves in the foot already. =)