mozilla / elasticutils

[deprecated] A friendly chainable ElasticSearch interface for python
http://elasticutils.rtfd.org
BSD 3-Clause "New" or "Revised" License
243 stars 76 forks source link

ESTestCase should handle asynchronous indexing via celery tasks #214

Closed jmizgajski closed 10 years ago

jmizgajski commented 10 years ago

ESTestCase should handle asynchronous indexing via celery tasks - consider the following scenario:

With a simple test case setup:

Mapping type

#irrelevant code was removed

class UserMapping(Indexable, MappingType):
    @classmethod
    def get_model(cls):
        return TrackedUser

    @classmethod
    def get_mapping_type_name(cls):
        return 'user_type'

    @classmethod
    def get_index(cls):
        return 'user_index'

    @staticmethod
    @receiver(post_save, sender=TrackedUser)
    def update_user(instance, **kwargs):
        from elasticutils.contrib.django import tasks
        tasks.index_objects.delay(UserMapping, [instance.id])

and the test case:

class CollaborativeFilteringIndexingTest(ESTestCase):
    def test_tracked_user_insert(self):
        from common.models import TrackedUser
        from elasticutils.contrib.django import S
        from recommendation.mappings import UserMapping

        tu = TrackedUser(session_id='123123123123123123123')
        tu.save()
        tu_es = S(UserMapping).filter(id=tu.pk)[0].get_model()

        self.assertEquals(tu, tu_es)

we get:

elasticsearch: DEBUG: < {"error":"IndexMissingException[[main_index_eutest] missing]","status":404}

Also the ESTestCase should create indicies that reflect the ones defined in the mapping types. In this case it should be user_index_eutest

willkg commented 10 years ago

The ESTestCase properties should handle the index issue.

https://github.com/mozilla/elasticutils/blob/master/elasticutils/tests/__init__.py#L12

The celery-is-asynchronous issue is handled by wrapping this with the setting that forces celery to be synchronous. I think it's CELERY_ALWAYS_EAGER.

Have you tried either of those?

willkg commented 10 years ago

Oh, hrm... the docs for ESTestCase kind of suck. I'll create an issue to fix that.

jmizgajski commented 10 years ago

We have CELERY_ALWAYS_EAGER set to True for all our tests so this is not the case. I will look into the source for ESTestCase and try to figure it out but the ESTestCase is one of the best things about ElasticUtils so improving the documentation and fixing this issue is a great idea. Can't really do TDD without it:)

jmizgajski commented 10 years ago

After inspecting the source code it seems to me that ESTestCase doesn't do what I think it should.

How it works now (correct me if I'm wrong).

  1. It creates an single index with predefined name
  2. It loads a predefined set of documents to that index
  3. It overrides settings
  4. It reroutes all queries to this fake index.

How i would see it:

  1. It overrides all index and mapping type definitons by appending _estestcase (+ maybe some GUID so tests can be run in parallel).
  2. It keeps track of all indices created during the test by wrapping appropriate methods.
  3. It deletes all indicies that were created during the teardown phase.
willkg commented 10 years ago

Mmm... That's a lot harder because then ElasticUtils would need to wrap the ES object used everywhere. I don't think that's feasible with the current architecture.

I'm definitely interested in making ESTestCase better. It's one of the pieces that has grown over time based mostly on my needs and what I've been coding for Input, SUMO and ElasticUtils. It could probably use a refactoring for a better API.

jmizgajski commented 10 years ago

You could subclass the ES object used for ESTestCase and make it conditional depending on the settings, which should not be tremendously difficult (but I might be wrong, I did not go through all your code yet)

willkg commented 10 years ago

There's an infinite number of ways to get an ES object, though. We can't wrap all possible ways.

jmizgajski commented 10 years ago

My idea right now is to create 2 instances of elasticsearch - one for testing. Set up the different urls for dev and tests and then write a small TestClass that will delete all indicies in the test environment during tearDown phase of the test. How do you find it?

willkg commented 10 years ago

So, anything can create an ES object. ElasticUtils shunts those into get_es() functions/methods, but those have a funky override pattern and because it's tied into class hierarchies, I'm not sure you can create a single ES, do the things you want to do with it, and expect all the code in the codebase to use that one ES.

That's the problem I'm not sure we can solve unless we mocked the ES object in elasticsearch-py and wrap some things there at the source. However, that has namespace problems, so even that isn't foolproof.

I'm just not sure we can do this with the current architecture in a way that doesn't surprise people because it goes awry in odd ways depending on what their code looks like.

I'd love to be wrong about that, but I think that's the case.

jmizgajski commented 10 years ago

So maybe my simple idea might be quite reasonable. It's very similar to the case of mocking MySQL with sqlite for tests in Django. Is there a method in ElasticUtils to delete all indicies and documents at once or do I need to know the name of each one beforehand?

A further plus would be that it would behave exacly as expected (same index names that might mess up raw queries) and it would be quite easy to reuse the stack for creating DB fixtures that would then be translated to mapping types.

willkg commented 10 years ago

No, there isn't. ElasticUtils doesn't keep track of indexes, mapping types or anything else.

jmizgajski commented 10 years ago

Should be quite easy to add since the API supports _all http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-index.html

willkg commented 10 years ago

The projects I work on share a CI cluster with other projects and if we went and deleted all the indexes on our CI system, they'd kill us. I have no idea if that's a common case or not, but I don't think I want to add that to ElasticUtils.