Open fortable1999 opened 12 years ago
Original comment by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
A general tuple solution is trivial for sorting documents (since Whoosh 3 allows storing arbitrary per-document values to use as sorting keys), but for fast searches using the inverted index you need to be able to serialize the tuples into comparable bytestring terms.
Such a serialization is possible if you specify that the "tuple" consist only of numbers of a certain type, or fixed-length strings, or something like that, but I don't see a way to accomplish it in the general case.
Original comment by Thomas Waldmann (Bitbucket: thomaswaldmann, GitHub: thomaswaldmann).
please note that this suggestion initially was about version support, but was later generalized to tuple support.
i think whoosh maybe should not directly support version numbers (too special, too differently handled), but offer some generic tuple datatype that compares/sorts like tuples usually do. that can be the base for implementing version support on top of it.
Original comment by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Hi, I thought I replied to this a while ago but apparently I never clicked "Post" and then closed the browser or something. :(
What we'd need is an algorithm to convert a version tuple to a bytestring, where the encoded bytestrings compare as the version tuples would. Looking at the "new version" spec in PEP 386 this shouldn't be a problem.
The tuple representations people use can vary widely (see verlib's internal representation), so a version field type would probably only accept version strings and parse them internally before converting to bytes.
I think a simple parser like yours would be sufficient, at least at first. If I wanted to support the full PEP 386 format, I'd want to include verlib.py
in support but I'm not sure what the licensing of that code is.
I'm interested to work on this as a demo for making custom field types. I'm not sure if it's generally useful enough to make it a standard part of the library... It might work better as a recipe in the docs, depending on the LOC.
Original report by Thomas Waldmann (Bitbucket: thomaswaldmann, GitHub: thomaswaldmann).
a more general idea is to support tuple type, below is one application of that.
this is a rather special idea, but it could be useful for everybody who deals with indexing and search for versions (or ranges of versions).
if you have versions like 1.0.1, 1.0.2a1, 1.0.2b2, 1.0.2, 1.0.10 (see PEP386), you need special support for sorting and comparing them correctly, which is representing them as tuples and use tuple comparison.
I have written some code to correctly deal with versions (can be relicensed):
http://hg.moinmo.in/moin/2.0/file/tip/MoinMoin/util/version.py
http://hg.moinmo.in/moin/2.0/file/tip/MoinMoin/util/_tests/test_version.py
Of course it is likely impossible to support everybody's different ways to create version numbers, but this would at least cover some schemes as used in the python world.
Here is some other code:
https://bitbucket.org/tarek/distutilsversion/src/17df9a7d96ef/verlib.py
Note: we have a GSOC project for implementing a ticket system / issue tracker within moin2. :)