Closed maneeshpm closed 3 years ago
ping @mgautierfr
Ticket is unclear to me, what is concretly the bug and the impact?
The bug is we will be getting the unaccented title using the SearchIterator::getTitle
method. That is, if the actual title is "DeLorean"
, we will get "delorean"
as the output instead of getting the title as it is.
I'm not sure of what to do here.
See https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/indexing/values.html As explained, the values are useful to do query/sort on them, not to store user information on the document.
We are using the title value to sort the result by title when we are in suggestion mode, and we probably want this sort being unaccented and case insensitive.
We could :
SearchIterator::getTitle()
return the title of the entry and not the one store in xapian database.SearchIterator::getTitle()
as it returns no real directly useful information.getIndexedTitle
)@mgautierfr I think we should go ahead and make it return the title of the entry. This way, we won't break anything that is already working and make the function behave in an expected manner.
SearchIterator::getTitle()
gives the unaccented title instead of the actual title. That is, even if the title is"DeLorean"
, we get"delorean"
as the output.When we index the title, the value stored in
title:0
slot of thevaluesmap
is the unaccented title. This happens becausezim::removeAccents()
is called in the constructor ofDefaultIndexData
.This was missed by our unit tests for several reasons like calling the
getTitle
from the dereferencedentry
, non-availability of mix of upper and lower case in tests where we actually call it from the search iterator. An easy fix is to drop this behavior from the constructor because we are anyway explicitly callingzim::removeAccents()
where it is really required inXapianIndexer::indexTitle()
. Suggestions?