whoosh-community / whoosh

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
Other
252 stars 37 forks source link

tuple type support #258

Open fortable1999 opened 12 years ago

fortable1999 commented 12 years ago

Original report by Thomas Waldmann (Bitbucket: thomaswaldmann, GitHub: thomaswaldmann).


a more general idea is to support tuple type, below is one application of that.

this is a rather special idea, but it could be useful for everybody who deals with indexing and search for versions (or ranges of versions).

if you have versions like 1.0.1, 1.0.2a1, 1.0.2b2, 1.0.2, 1.0.10 (see PEP386), you need special support for sorting and comparing them correctly, which is representing them as tuples and use tuple comparison.

I have written some code to correctly deal with versions (can be relicensed):

http://hg.moinmo.in/moin/2.0/file/tip/MoinMoin/util/version.py

http://hg.moinmo.in/moin/2.0/file/tip/MoinMoin/util/_tests/test_version.py

Of course it is likely impossible to support everybody's different ways to create version numbers, but this would at least cover some schemes as used in the python world.

Here is some other code:

https://bitbucket.org/tarek/distutilsversion/src/17df9a7d96ef/verlib.py

Note: we have a GSOC project for implementing a ticket system / issue tracker within moin2. :)

fortable1999 commented 12 years ago

Original comment by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).


A general tuple solution is trivial for sorting documents (since Whoosh 3 allows storing arbitrary per-document values to use as sorting keys), but for fast searches using the inverted index you need to be able to serialize the tuples into comparable bytestring terms.

Such a serialization is possible if you specify that the "tuple" consist only of numbers of a certain type, or fixed-length strings, or something like that, but I don't see a way to accomplish it in the general case.

fortable1999 commented 12 years ago

Original comment by Thomas Waldmann (Bitbucket: thomaswaldmann, GitHub: thomaswaldmann).


please note that this suggestion initially was about version support, but was later generalized to tuple support.

i think whoosh maybe should not directly support version numbers (too special, too differently handled), but offer some generic tuple datatype that compares/sorts like tuples usually do. that can be the base for implementing version support on top of it.

fortable1999 commented 12 years ago

Original comment by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).


Hi, I thought I replied to this a while ago but apparently I never clicked "Post" and then closed the browser or something. :(