kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
62 stars 63 forks source link

RFC: Replace Elasticsearch by OpenSearch? #4896

Open stweil opened 2 years ago

stweil commented 2 years ago

Elasticsearch changed the license in 2021.

OpenSearch is a fork which continues to use the old license.

We might consider switching to OpenSearch. That should be easy from the Open Source releases of Elasticsearch, but I expect that it will become more and more difficult with newer releases and larger differences.

Links: https://github.com/opensearch-project/ https://de.wikipedia.org/wiki/OpenSearch_(Software) https://opensearch.org/blog/technical-posts/2021/10/moving-from-opensource-elasticsearch-to-opensearch/

henning-gerhardt commented 2 years ago

Personally I approve this request but the current implementation and usage is heavily depending on the used version of ElasticSearch/OpenSearch and so I don't know if there is a easy switch to OpenSearch usage possible.

So far as I know there is an process to "migrate" from direct ElasticSearch usage to a usage over hibernate-search with integration of ElasticSearch. So far as I know about hibernate-search this tool can use different versions of ElasticSearch and maybe OpenSearch by only switching configuration values. I would prefer this solution as we then a lousy coupled to ElasticSearch or OpenSearch without changing code inside Kitodo.Production to switch the used search server implementation.

solth commented 2 years ago

It seems HibernateSearch is indeed compatible with OpenSearch (see https://hibernate.atlassian.net/browse/HSEARCH-4212 for details). Since all direct ElasticSearch libraries and packages will be removed from the Kitodo.Production repository with the switch to HibernateSearch, this should then indeed resolve the licensing issue.

stweil commented 1 year ago

According to the documentation recent versions of Hibernate Search support both Elasticsearch and OpenSearch. so supporting both in Kitodo.Production might be an easy task.

thomaslow commented 1 year ago

I have some questions regarding the migration to hibernate-search. Since this issue is mentioned in the recent announcement of the development fund 2023, I'll ask them here.

If I remember correctly, migrating to hibernate-search was already experimented with as part of the development fund 2021, see #4208.

@solth Would it be possible for you to summarize what was done and learned in 2021?

Also, there is a public hibernate-search branch that was started in 2021.

Thank you and Cheers!

solth commented 1 year ago

Yes, of course. As you mentioned, the first attempt to replace ElasticSearch with HibernateSearch was done in the context of #4208 where the actual goal was to update ElasticSearch to version 7 (which was succesful).

At that time we hoped the required changes for the migration to HibernateSearch would be manageable and could be performed in the context of the same issue with little extra effort. That turned out to be wrong, though. Instead, the necessary changes proved to be extensive (as you can see in the number of changes made in the branch you mentioned: https://github.com/effective-webwork/kitodo-production/tree/hibernate-search) so we never came around to actually finish the transition to HibernateSearch.

In my experience, the main challenge in the transition to HibernateSearch was the incompatibility of ElasticSearch QueryBuilder objects with the HibernateSearch syntax. The later uses so called SearchPredicates instead of QueryBuilders, which in turn are created by SearchFactory instances. AFAIK these factories only support a lambda method style syntax to create SearchPredicates and once created, those SearchPredicate instances cannot be extended by further clauses or filters anymore. Since that is exactly what is currently done in Kitodo.Production, though, where ES QueryBuilder objects are passed between and augmented in many interconnected classes like SearchService, FilterService and the service classes for the individual object types (most notably ProcessService), refactoring all those QueryBuilder related functions in a way that the SearchFactory variable within the lambda expression can be passed to other functions was a major hassle.

I recently rebased the HibernateSearch branch to resolve conflicts with the current master branch. It is a WIP but I think it can be used as a base for the integration of HibernateSearch in Kitodo.Production. It does already load list entries like processes via HibernateSearch and the indexing on the indexing page is done using the HibernateSearch MassIndexer. What I cannot say, though, is whether it is the best approach or if rewriting the whole filter and query architecture from ground up to better accomodate the new syntax would be a better way to proceed.

Concerning the performance, the version in that branch is currently quite a bit slower than the current master branch, but that is perhaps due to suboptimal building of quries/search predicates. Indexing the whole index using the MassIndexier is much faster, though.

One thing that is worth noting is that using HibernateSearch we can get rid of DTO objects because HibernateSearch does load index data directly into bean objects, which should simplify the code in many places considerably.

henning-gerhardt commented 1 year ago

One thing that is worth noting is that using HibernateSearch we can get rid of DTO objects because HibernateSearch does load index data directly into bean objects, which should simplify the code in many places considerably.

This DTOs was introduced to avoid a possible publication of the Hibernate beans with database credentials in the UI when an error occur or through some manipulation of the UI to access them on client side.

thomaslow commented 1 year ago

@solth Thank you for your summary. I wasn't aware of the query building problem. It seems to be possible to logically combine SearchPredicates. But I'm not sure whether that is sufficient to solve the problems you mentioned above.

solth commented 1 year ago

I think the main problem I encountered was that the current ElasticSearch classes like QueryBuilder are very deeply integrated and used at many different locations in the Kitodo.Production source code, so removing and replacing them completely with new classes from HibernateSearch - that are constructed in a totally different manner - was more difficult than I thought.

Perhaps there are other approaches to replace ElasticSearch with HibernateSearch instead of trying to keep the current class and method architecture of Production related to filters and searching and directly using HibernateSearch objects in all those locations. Maybe it is easier to not use HibernateSearch objects in all those data service classes like TaskService or ProcessService but instead encapsulate all required data in new custom objects and pass those to the final search / filter services that then create a HibernateSearch SearchPredicate without the need to maintain and pass such an object through all layers of the application.

stweil commented 2 months ago

@matthias-ronge, what is the status of task #5760? When do we expect that Hibernate Search (with OpenSearch) will have replaced Elasticsearch?

Would intermediate support of OpenSearch help if this task still takes some time? I started my own OpenSearch branch yesterday which now passes mvn install. The CI tests are still failing.

stweil commented 1 month ago

I now finished a draft pull request #6131 for OpenSearch which seems to work.