python / cpython

The Python programming language
https://www.python.org
Other
66.35k stars 31.63k forks source link

add way to detect bsddb version #36914

Closed 91e69f45-91d9-4b12-87db-a02908296c81 closed 22 years ago

91e69f45-91d9-4b12-87db-a02908296c81 commented 22 years ago
BPO 584409
Nosy @loewis, @smontanaro, @warsaw

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['extension-modules'] title = 'add way to detect bsddb version' updated_at = user = 'https://bugs.python.org/phr' ``` bugs.python.org fields: ```python activity = actor = 'skip.montanaro' assignee = 'none' closed = True closed_date = None closer = None components = ['Extension Modules'] creation = creator = 'phr' dependencies = [] files = [] hgrepos = [] issue_num = 584409 keywords = [] message_count = 17.0 messages = ['11635', '11636', '11637', '11638', '11639', '11640', '11641', '11642', '11643', '11644', '11645', '11646', '11647', '11648', '11649', '11650', '11651'] nosy_count = 6.0 nosy_names = ['nobody', 'loewis', 'skip.montanaro', 'barry', 'nnorwitz', 'phr'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue584409' versions = [] ```

91e69f45-91d9-4b12-87db-a02908296c81 commented 22 years ago

The bsddb module docs say that some Python configurations use Berkeley db 1.85 and others use the incompatible 2.0. Maybe by now there are later versions as well. There's no way listed for a Python script to know which version of bsddb is running underneath! That's not so great, since the versions don't interoperate and don't support the same operations.

Proposed fix: please add a new function to the module, bsddb.db_version(). This would return a constant string like "1.85" or "2.0", built at Python configuration time.

smontanaro commented 22 years ago

Logged In: YES user_id=44345

This is an interesting idea, but one that I think is less useful than you might believe. The bsddb module exposes the same API based on the 1.85 C API regardless what version of Berkeley DB you link with. (I have linked it with versions 1.85 through 4.something.) I've been using the bsddb module since its inclusion in Python and have never actually cared what version of the underlying C API the module what linked with. Someone programming to the C API *would* care about version differences, because the C API has grown richer over the years. The bsddb module code just hasn't ever used any new functionality. Note that the pybsddb3 module does use the new functionality in the version 3 and 4 APIs.

What changes on you between versions are the file formats, and you should only care about that at the point where you upgrade from one version of Berkeley DB to another. (Generally, you realize this when you start getting errors trying to open old databases.) Sleepycat provides command line tools to help you convert from one file version to another, so once you realize your file formats have changed, you wind up poking around your disk looking for old format Berkeley DB files, run the tools on them, then go back to more interesting things, like writing stable sorts. ;-)

91e69f45-91d9-4b12-87db-a02908296c81 commented 22 years ago

Logged In: YES user_id=72053

OK, it looks like both the docs and Skip's note are a bit unclear. When you say only the 1.85 API is exposed, does that mean the 1.85 file format is also used either way? In particular, if Python is linked with Berkeley DB 2.0 and I create a db with it, will that db interoperate with another application that's linked to Berkeley DB 1.85?

If it won't interoperate, then it's definitely worthwhile to add some kind of call to the Python bsddb module to let Python scripts find out which file format they're dealing with.

Also, I didn't realize only the 1.85 API was supported. I hope pybsddb3 can become part of the standard Python distribution, since I'd like to use Sleepycat's transaction features from Python scripts.

smontanaro commented 22 years ago

Logged In: YES user_id=44345

Sorry for the lack of clarity. What I should have said is that the code which implements the bsddb extension module only calls the 1.85-compatible C API exposed when you configure the Berkeley DB code using the --enable-compat185 flag. All the wonders and mysteries of the later parts of the API are lost on the bsddb code.

There are two levels of compatibility, the API level and the file format level. All users of the bsddb module should care about is the file format level compatibility and handling that is a one-time problem dealt with using tools provided by Sleepycat as part of their distribution.

The topic of including bsddb3 in the standard distribution has been discussed before. For one example, see:

http://mail.python.org/pipermail/python-dev/2002-January/019261.html

I think the main stumbling block to incorporation is that it only works with versions 3 and 4 of the Berkeley DB library. There is a more recent thread that currently escapes my feeble attempts to find it.

61337411-43fc-4a9c-b8d5-4060aede66d0 commented 22 years ago

Logged In: YES user_id=21627

I also believe that this problem should be fixed by importing pybsddb3.

On this issue itself: it turns out impossible to find out, programmatically, what version of Sleepycat DB you are running if all you have is the compatibility API: both the compile-time and the run-time version information is not available. Furthermore, you cannot include both new and old headers, since they conflict. So given the current code base, this problem cannot be solved.

3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

Logged In: NO

How can it be "impossible" to find out? The build script for the bsddb module can check what version is being linked, and include a string reachable from Python.

At worst, there could be a routine added to the module that actually creates a database, then examines the db file and figures out from the bytes inside which version it is.

Paul

smontanaro commented 22 years ago

Logged In: YES user_id=44345

I agree, if it's wanted badly enough, we can figure out what version was linked with the module code. The "define macros at configure time" idea is possible. The "create a database and peek at it" idea won't work though. There are library version numbers and file versions. They don't always change in sync.

Like I said before, I'm skeptical a Python script would really need to know what version of the underlying library was linked with bsddbmodule.o. Can you motivate things with a use case?

Skip

warsaw commented 22 years ago

Logged In: YES user_id=12800

It's useful if for no other reason than to figure out which bugs you need to work around \<wink>.

BTW, PyBSDDB does give you the ability to find out both the version of the wrapper you've got and the version of the underlying library.:

>>> import bsddb3
>>> bsddb3.__version__
'3.3.0'
>>> bsddb3._db.version()
(3, 3, 11)

You've also got DB_VERSION_STRING, DB_VERSION_MAJOR and DB_VERSION_MINOR.

Note that if you're linking against a newer version of the library using the 1.85 API, *that* might be a difficult thing to figure out. Off hand (and I can't check right now), I don't know if that would give yo a different bsddb3._db version constant or would otherwise be detectable.

61337411-43fc-4a9c-b8d5-4060aede66d0 commented 22 years ago

Logged In: YES user_id=21627

There is a bug report (somewhere) that whichdb incorrectly determines the DB module. In that case, whichdb would correctly find out that this is a Sleepycat database, and suggest to use dbhash. In turn, dbhash would fail to open the file, because the file version was incorrect. It would have been correct to use the dbm module, since the dbm library was also based on Sleepycat, but had a different version than the bsddb library installed on the same system.

This problem can be solved if you can find out what file version(s) your bsddb module supports.

The library version seems less useful to me, indeed.

warsaw commented 22 years ago

Logged In: YES user_id=12800

We could probably write a little utility to sniff file version numbers based on the magic number as given in this doco:

http://www.sleepycat.com/docs/ref/install/magic.txt

smontanaro commented 22 years ago

Logged In: YES user_id=44345

This is precisely what Sleepycat's db_dump/db_load type tools take care of. It's a one-time thing. When you upgrade from one version of Berkeley DB to another you need to run these tools to make sure the file formats are up-to-date. The only problem I see here with the current code is that the exception which is raised is rather mystical - something like a very large number followed by "invalid argument". The most significant change I would see making here is to have the bsddb module recognize that weird error and raise an exception with a saner message.

I can't see the programmer or the user getting more information out of "expected hash file format version 7 but got hash file format version 5".

61337411-43fc-4a9c-b8d5-4060aede66d0 commented 22 years ago

Logged In: YES user_id=21627

No, the main point would be that whichdb would not incorrectly report the file format as 'dbhash', when it isn't (because dbhash supports a different version).

smontanaro commented 22 years ago

Logged In: YES user_id=44345

What would you have it report? Dbhash is nothing more than a thin wrapper around bsddb. Whichdb is a very fragile beast in my opinion, but it does already do some file content introspection, and if the file is some sort of Berkeley DB hash file, it will report it more-or-less correctly as "dbhash" (more correct in my opinion than returning None or "").
This includes files created using the dbm module, if that module was linked with the dbm emulation API of Berkeley DB.

I still fail to see how any of this detection people propose would help. If you have a version 5 hash file it doesn't matter how positive you are about it. A later version of the Berkeley DB library which expects a version 7 hash file is still going to barf on the older file format. To make things work again you're going to have to resort to running Sleepycat's tools to convert the file to the proper format. It's not like you can detect file version differences and then plunge ahead along a different path without alerting the user to the problem.

61337411-43fc-4a9c-b8d5-4060aede66d0 commented 22 years ago

Logged In: YES user_id=21627

It would solve bug bpo-491888, and allow to give a better diagnostic for bpo-504282.

smontanaro commented 22 years ago

Logged In: YES user_id=44345

I can't comment on bpo-504282 (I don't know what the problem is because the poster didn't provide enough information about the files and their names). I attached a patch to bpo-491888 which should solve that problem.

still-unconvinced-ly y'rs,

Skip

d21744ff-f396-4c71-955e-7dbd2e886779 commented 22 years ago

Logged In: YES user_id=33168

bsddb has changed a lot since this bug report. Have the issues been resolved?

smontanaro commented 22 years ago

Logged In: YES user_id=44345

In the new bsddb module you can call bsddb.db.version() to get the version of the underlying library, so I'm going to close this. I don't know if it's worth it to add the same functionality to the old bsddb185 module or not.