pombreda / djapian

Automatically exported from code.google.com/p/djapian
Other
0 stars 0 forks source link

Error on prefetch when objects is already deleted, but xapian index not updated, yet #122

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
When using prefetch and we hit a model object that has been deleted already, 
one get a key error in Djapian. Such a non-existent model object is found, in 
case of an outdated xapian index.

In our case, we call index --rebuild once a day. In between, objects are 
deleted by users. To prevent the subsequent key error in serach results, please 
change the following in ** resultset.py** 

resultset.py, line 150+

for hit in hits:
    try: hit.instance = instances[hit.pk]
    except: self._resultset_cache.remove(hit)

Instead of attaching instances[hit.pk] to hit.instance (which throws the actual 
error), we simply remove this hit from the resultset :)

Cheers,
Simon

Original issue reported on code.google.com by i...@pagewizz.com on 12 Nov 2010 at 6:26

GoogleCodeExporter commented 9 years ago
yes,

i found it also. but this is not a 100% solution, when using pagination. it 
only prevents code from showing errors.

bye

Original comment by matu...@gmail.com on 24 Nov 2010 at 3:39

GoogleCodeExporter commented 9 years ago
I think the true solution (which supports count() and pagination) would be 
using a custom MatchDecider. Below are two examples of how to get a custom 
MatchDecider for your Djapian-enabled search.

First one is a naive DBExistsCompositeDecider which just checks if the match we 
got from Xapian search backend still exists in DB. It hits DB for checking 
every result hit from Xapian search results, so performance is degraded quite a 
bit.

{{{
from djapian.decider import CompositeDecider
from django.core.exceptions import ObjectDoesNotExist
class DBExistsCompositeDecider(CompositeDecider):
    def __init__(self, model, *args, **kwargs):
        super(DBExistsCompositeDecider, self).__init__(model, *args, **kwargs)
        self.to_python = model._meta.pk.to_python
        self.objects = self._model._default_manager

    def __call__(self, document):
        res = super(DBExistsCompositeDecider, self).__call__(document)
        if not res:
            return res

        pk = self.to_python(document.get_value(1))
        try:
            self.objects.get(pk=pk)
        except ObjectDoesNotExist:
            return False

        return True
}}}

The second one uses Djapian's Change model (DB table) to check if there are 
pending requests to update the search index after some objects have been 
deleted. The ChangesCompositeDecider keeps a list of object IDs of the known 
content_type (Django model) which are being marked as 'deleted', so they are 
already gone from the DB but still exist in Xapian index. The DB gets hit a 
single time per each Django Model used during the search (in 90% cases that 
would be only once, but if you are using CompositeIndexer then your experience 
may vary).

{{{
from djapian.decider import CompositeDecider
from djapian.models import Change
from django.contrib.contenttypes.models import ContentType
class ChangesCompositeDecider(CompositeDecider):
    def __init__(self, model, *args, **kwargs):
        super(ChangesCompositeDecider, self).__init__(model, *args, **kwargs)
        self.to_python = model._meta.pk.to_python
        self._deleted = map(self.to_python,
                            Change._default_manager.filter(content_type=ContentType.objects.get_for_model(model),
                                                           action=Change.ACTIONS[2][0]) \
                                                   .values_list('object_id', flat=True))

    def __call__(self, document):
        res = super(ChangesCompositeDecider, self).__call__(document)
        if not res:
            return res

        pk = self.to_python(document.get_value(1))
        if pk in self._deleted:
            return False

        return True
}}}

Well, the last thing to be done is to use the appropriate decider for our 
ModelIndexer class and that's it!

{{{
class PosterIndexer(WeightenedIndexer):
    decider = ChangesCompositeDecider
    fields = ('show__name', 'show__description',
              )

    trigger = lambda indexer, obj: obj.date_end >= date.today()

# Registering models for the full-text search index (powered by djapian).
djapian.add_index(Poster, PosterIndexer, attach_as='indexer')
}}}

We should consider inclusion of these (or alike) MatchDeciders into the Djapian 
distribution as an examples and handy tools for direct usage.

Original comment by esizi...@gmail.com on 4 Mar 2011 at 2:26

GoogleCodeExporter commented 9 years ago
BTW, the examples above does not cupport CompositeIndexer use-case directly. 
The update is possible and quite trivial, though.

Original comment by esizi...@gmail.com on 5 Mar 2011 at 8:25

GoogleCodeExporter commented 9 years ago
In r384 a test case demonstrating the issue has been committed. I would prefer 
we implement a fix or any workaround as part of the default 'out of the box' 
solution for better user experience.

Original comment by esizi...@gmail.com on 21 Oct 2011 at 12:51

GoogleCodeExporter commented 9 years ago

Original comment by esizi...@gmail.com on 24 Oct 2011 at 11:27

GoogleCodeExporter commented 9 years ago
In r385 a trivial fix has been committed, needs a review.

It may be nice to notify somehow that not all results which were found by 
Xapian are really available.

Original comment by esizi...@gmail.com on 26 Oct 2011 at 10:47