o19s / hello-nlp

A natural language search microservice
Other
95 stars 12 forks source link

Do you still recommend the usage of Hello NLP? #31

Closed nickchomey closed 2 years ago

nickchomey commented 2 years ago

I just discovered Hello NLP while looking for a way to use spaCy to pre-process (lemmatize etc...) documents before indexing them in ElasticSearch. It looks like a great solution to this problem!

But I see that it hasn't been updated in a couple of years - do you still recommend its usage? Does it work with ES 7.17? If not, is there another tool or workflow that I should check out?

Thanks!

flaxsearch commented 2 years ago

Hi Nick,

The development of HelloNLP was led by Max Irwin who is no longer at OSC and is I believe pursuing other projects. I think it's thus unlikely we'll be updating it soon. Of course, as it's open source you can still use it, fork it or even contribute to it yourself!

nickchomey commented 2 years ago

Thanks! I'm not sure I'd be able to contribute much to it, but I'll try to see if I can get it working with my project in some way.

Though, I now see that Max is working on a project that is focused on AI-driven search - presumably he (and others) find that to be a more fruitful approach, so I'll have to look into how I might be able to take advantage of it rather than the more traditional approach that I'm currently following.

Do you have any recommendations of tools that make this easy to use with Elasticsearch (which I feel somewhat tied to given that I'm using this with a WordPress site and the ElasticPress plugin makes it all easy to use)?

nickchomey commented 2 years ago

As it turns out, Elasticsearch 8.0 (now at 8.3) has a lot of great AI/vector-based NLP capabilities and tools that appear to make external preprocessing (e.g. with spaCy) redundant. I suspect that's a large reason why Max shifted his focus to these topics.

Thanks anyway!

nickchomey commented 2 years ago

To anyone who comes here, I strongly recommend using Haystack. It is what Hello NLP was aiming to become - a tool that simplifies/abstracts out the details of adding NLP to your existing Elasticsearch pipeline. More than that, it works with all sorts of other datastores, such as various vector databases.

It might be a bit tricky to get started, but is then quite easy and flexible to work with. It has a very bright future

http://haystack.deepset.ai/