mchaput / whoosh

Pure-Python full-text search library
Other
569 stars 69 forks source link

sorting numtype float is broken #44

Open captainconj opened 1 year ago

captainconj commented 1 year ago

Sorting on numtype float is broken. Defining the schema is great, but when it comes time to add docs I get the following error:

Traceback (most recent call last):
  File ".../brokensort.py", line 13, in <module>
    w.add_document(title=u"Big Deal", price=20.0)
  File ".../.local/share/virtualenvs/.../lib/python3.10/site-packages/whoosh/writing.py", line 784, in add_document
    perdocwriter.add_column_value(fieldname, column, cv)
  File ".../.local/share/virtualenvs/.../lib/python3.10/site-packages/whoosh/codec/base.py", line 820, in add_column_value
    self._create_column(fieldname, column)
  File ".../.local/share/virtualenvs/.../lib/python3.10/site-packages/whoosh/codec/whoosh3.py", line 190, in _create_column
    writers[fieldname] = column.writer(f)
  File ".../.local/share/virtualenvs/.../lib/python3.10/site-packages/whoosh/columns.py", line 649, in writer
    return self.Writer(dbfile, self._typecode, self._default)
  File ".../.local/share/virtualenvs/.../lib/python3.10/site-packages/whoosh/columns.py", line 666, in __init__
    self._defaultbytes = self._pack(default)
struct.error: required argument is not an integer

Here is a sample that produces the error:

import os
from whoosh import fields, index, qparser

index_dir = ".index"

schema = fields.Schema(title=fields.TEXT(stored=True),
                       price=fields.NUMERIC(sortable=True, numtype=float))
if not os.path.exists(index_dir):
    os.mkdir(index_dir)
ix = index.create_in(index_dir, schema)

with ix.writer() as w:
    w.add_document(title=u"Big Deal", price=20.0)
    w.add_document(title=u"Mr. Big", price=10.0)
    w.add_document(title=u"Big Top", price=15.0)

with ix.searcher() as s:
    qp = qparser.QueryParser("title", ix.schema)
    q = qp.parse("big")

    # Sort search results from lowest to highest price
    results = s.search(q, sortedby="price")
    for hit in results:
        print(hit["title"])

One can work around this by converting their float to Decimals and specifying decimal_places in the schema, but this has it's own drawbacks.