translate / pootle

Online translation tool
http://pootle.translatehouse.org
GNU General Public License v3.0
1.49k stars 287 forks source link

Add ElasticSearch indexing for Units #4291

Open phlax opened 8 years ago

phlax commented 8 years ago

We could drastically improve performance, and remove the need to denormalize fields if we maintained indexes on objects - particularly Units

Considerations:

phlax commented 8 years ago

Some comments on ES dependency

make backend configurable - ie choice of backends or db for search?

  • making db search an option would definitely simplify initial installation
  • with new search classes its trivial to drop in a config to switch between db/index

make ES a hard dependency?

  • perhaps better framed as "make django-haystack" a hard dep and at least one indexing backend
  • this is my favoured option
  • can make use of index outside of search
  • haystack provides multiple backends eg solr/xapian - these can be switched fairly transparently although there are some limitations
  • simplifies documentation
julen commented 8 years ago

From @translate/evernote's side there's no problem to make ES a hard dependency — we already make heavy use of it for the TM. So I see this more like a product-wise decision for those who need to support upgrades.

We also quickly discussed this on IRC with @phlax and some questions arose, such as:

  1. Can feature-parity be achieved, including exact search?
  2. Would search be broken while the index is being rebuilt?

The first answer seems like a yes — it'd be great if you could confirm @phlax.

Regarding the second one, there's a concern if search (UI- and internal API-wise) would be broken when (re)indexing takes place. There might also exist some delay in the availability of the information in the index with respect to the DB. So it seems like the safest approach is ensuring search works independently of having an index. Should haystack be the best fit for this, I'd be :+1: to require it.

phlax commented 8 years ago

Can feature-parity be achieved, including exact search?

so - as mentioned when chatting with @julen on IRC - i have had an issue with this in the past - but afaict its fine if the field you are searching is indexed correctly:

http://stackoverflow.com/questions/18201147/django-haystack-how-to-force-exact-attribute-match-without-stemming

phlax commented 8 years ago

ref https://github.com/translate/pootle_index