mozilla / elasticutils

[deprecated] A friendly chainable ElasticSearch interface for python
http://elasticutils.rtfd.org
BSD 3-Clause "New" or "Revised" License
243 stars 76 forks source link

switch ElasticUtils to become a django layer on top of elasticsearch-dsl-py #252

Closed willkg closed 9 years ago

willkg commented 9 years ago

The Elasticsearch folks have started a library called elasticseach-dsl. The code is at https://github.com/elasticsearch/elasticsearch-dsl-py . @robhudson and I have already done some work on it.

Like ElasticUtils, elasticsearch-dsl-py sits on top of elasticsearch-py and provides a nicer search API. There are some differences between how that API works, but generally speaking, it solves the same problems we were solving with ElasticUtils.

My current thinking is that we rewrite ElasticUtils with the following requirements:

  1. base it off of elasticsearch-dsl-py
  2. return to being a Django-specific library (people can use elasticsearch-dsl-py for non-Django usage)
  3. reduce the scope to providing bits that make Elasticsearch convenient in a Django context

Why? Well, there's no need to be a general purpose library anymore--elasticsearch-dsl-py will probably do that better. I think there is a need for a library that makes things convenient for Django. At a minimum, we'd use this at Mozilla. Maintenance of large libraries where the maintainers don't have a lot of time generally sucks. Reducing the focus/scope should make it easier to maintain at a higher quality. Plus, elasticsearch-dsl-py makes some different architectural decisions and thus already supports a large number of Elasticsearch features that will be difficult to do in ElasticUtils as it currently exists.

Things that need to happen before this can happen:

  1. elasticsearch-dsl-py needs a critical mass of functionality to be able to base the new ElasticUtils on it
  2. there needs to be an elasticsearch-dsl-py release--we don't want to base our library on alpha code
willkg commented 9 years ago

The other possibility is that we sunset ElasticUtils and build our Django library as a new project. It's also entirely possible there's another thing out there that does this already, though I'm not aware of one. I'm definitely not a huge fan of the "ElasticUtils" name--it's confusing in a variety of ways.

Which are you in favor of?

  1. rewrite ElasticUtils?
  2. sunset ElasticUtils and start a new project?
nlundquist commented 9 years ago

As a newcomer to the space of ES and Django integration I'd advocate sunsetting ElasticUtils to clarify the distinction between the projects.

A rename to something with the word django in it would also help visibility for future users looking for a django-specific integration.

adriaant commented 9 years ago

I think the Django library would work best as a separate project. I have not found another thing that does the same as ElasticUtils Django contrib. It works well and is flexible enough. It would be great that have the Django lib together with the elasticsearch DSL lib. That would be more than enough.

sabine commented 9 years ago

I'm someone who needs a Django library for ES for use in a new application. I've only started working on integrating ElasticUtils last week, but it looks like it's doing what I need - Haystack was too high-level, building queries manually would be insane. So, I was really happy to find this. :)

From what I understand by reading this, the right way to go forward in my situation would be to

  1. keep using elasticutils.django.contrib to deal with the indexing, and
  2. use elasticsearch-dsl-py for querying ES instead of ElasticUtils? The goal from my side is to minimize migration effort down the road. At some point in the future, I will want to use advanced ES aggregations.

Actually, it would be nice if the Django lib could do one more thing: provide a clean way to index an object after it has saved to the database. My current "workaround" to avoid the race condition between celery worker (which is processing the task created in post_save) and the database (which is saving the model instance) is to give the celery task a countdown - but I'm not sure that is a sufficiently stable, satisfactory solution.

To me it also looks like having a small Django library + elasticsearch-dsl-py could do the job. What do you think?

ChristopherRabotin commented 9 years ago

A somewhat spin-off of elasticutils using elasticsearch-dsl-py is bungiesearch, which tries to use the best of elasticutils and haystack. Since the main object subclasses Search of elasticsearch-dsl-py, you can either build queries using bungiesearch or elasticsearch-dsl-py.

So, @grumpi , this enables you to build queries with bungiesearch that can automatically map to Django objects (or Defered objects), as well as use a query, filter, aggregation, etc. which was initially build with bungiesearch to modify an elasticsearch-dsl-py object.

Disclaimer: I am the initiator and maintainer of Bungiesearch. The project is currently very stable and used in production on Sparrho.com (which has over 1.2 million pieces of indexed content).

sabine commented 9 years ago

@ChristopherRabotin plugging in search to the Django models through a manager looks interesting as well. I'm really not sure how much "convenience on top of things" I really need, but one thing is for sure: I probably would have never found your library without the hint.

Just one thing: I can't seem to find information on licensing. I'm a crazy one-person bootstrapper and I'd really like to avoid violating anyone's copyrights. :)

ChristopherRabotin commented 9 years ago

Glad you're considering this library!

It's distributed in a BSD license, which is same as this one: https://github.com/Sparrho/venn-cljs/blob/master/LICENSE . I'll create an issue now to add that licensing file.

jxstanford commented 9 years ago

I've recently been using ElasticUtils quite heavily. I've basically been using it to "replace" the SQL Django backend. I like the semantics of ElasticUtils, and the steps it takes to smooth out the interface between Django models and ES. I think if you were to initiate a new project, it would be nice to have it implement (possibly minimal) database backend support for ES.

If you're just swapping out the query and filter code to use a different library, and potentially deprecating some of the non-Django functionality, it doesn't seem like a name changing event to me.

In either case I hope that the interface don't change too much so there is a clear way forward for those of us who have embraced this powerful project. Thanks!

willkg commented 9 years ago

I decided to deprecate this project. There are a few django shim libraries now and it's better to work on those or use elasticsearch-dsl-py directly at this point.

Closing this out.

adriaant commented 9 years ago

Can you point me to some of those django shim libraries you mention?

ChristopherRabotin commented 9 years ago

Could one of those be https://github.com/Sparrho/bungiesearch?

Disclosure: I am the maintainer of this project.

ap0091 commented 9 years ago

@willkg , can you let us know what some of the other libraries are?

Thank You

willkg commented 9 years ago

@ap0091 Read the comments in this issue and do a google search. I don't have experience with any of them.