hoaproject / ActionBoard

Roadmap, actions, milestones… everything related to the schedule of Hoa is here
http://hoa-project.net/
2 stars 3 forks source link

How to optimize ElasticSearch #12

Open Hywan opened 8 years ago

Hywan commented 8 years ago

Our search service is based on ElasticSearch. This application likes to eat memory. We need to optimize it.

Original author of ES instance: @GuillaumeDievart.

Tasks

Hywan commented 8 years ago

@K-Phoen is responsible to lead this project.

GuillaumeDievart commented 8 years ago

Hi,

actually, we have 141 documents indexed in es.

disk usage: 172MB (nGram tokenizer (3-20)) memory usage: 142MB (heap)

@K-Phoen if you want, I can give you an access to Elastic HQ, where you could found all stats about the current instance.

Hywan commented 8 years ago

It can be useful, maybe, don't know: http://blog.adrien-gallou.fr/2015/09/21/pertinence-elasticsearch.html, by @agallou.

GuillaumeDievart commented 8 years ago

I already read this great article ! But, it targets only the relevance, nothing about the memory usage.

And, I said you, you should not have a problem about memory with 150 articles.

K-Phoen commented 8 years ago

Same here, I don't think that the number of documents will be a problem but if needed we'll keep an eye on the global memory consumption (documents + server itself).

I'll try to replicate on my machine the elasticsearch setup used by Hoa (with the same documents) to experiment on the memory usage and to have numbers to give @Hywan and @Pierozi. This should help them to determine their needs in "server sizing".

Pierozi commented 8 years ago

Thanks @K-Phoen that should be helpful because i didn't' know much about elasticsearch consumption

Hywan commented 8 years ago

@GuillaumeDievart Could you put here the URLs of the search code repository please?

GuillaumeDievart commented 8 years ago

Hi @K-Phoen,

you have just to clone this repo https://github.com/GuillaumeDievart/Hoa-Search and to create the analyzer via the cmd Data/Bin/Init.php, then you can run the crawler.

Careful, this configuration has been created in May 2014, there are maybe few change since (on the es version).

I think, you can check the analyzer and mapping (auto, without boost, adjust analyzer by language ...), if you want to improve the relevance.

Hywan commented 8 years ago

@GuillaumeDievart Hope you understand that we would like to get eveything packed in the same place? We still want you on board!

GuillaumeDievart commented 8 years ago

Yes, I understood,

"We still want you on board!"

@Hywan I hope !! ;)

K-Phoen commented 8 years ago

@GuillaumeDievart awesome! I'll play with it when I have some free time :)

Hywan commented 8 years ago

@K-Phoen Did you got time to check a little bit?