openedx / public-engineering

General public issue repository for the Open edX engineering community
3 stars 1 forks source link

Move away from Elasticsearch #16

Open feanil opened 2 years ago

feanil commented 2 years ago

Due to the fact that AWS is no longer supporting the latest versions of Elasticsearch, we are considering deprecating our usage of ES in favor of the AWS replacement, OpenSearch.

This deprecation is in the initial stages of discovery, so we wanted to solicit community feedback before moving too far along on it, so there currently is no acceptance date for this deprecation ticket.

Discussion Thread: [https://discuss.openedx.org/t/deprecation-removal-depr-170-move-from-elasticsearch-to-opensearch/5844](Discussion Thread on Discuss)

Comment from Diana:

We here at edx.org are going to look into what it would take to remove all ES or OS dependencies and then evaluate from there.

dianakhuang commented 2 years ago

Teams at edX/2U have done discovery on the work for this ticket, and we have decided to go forward with using OpenSearch for several key use cases. We will be removing usages of Elasticsearch or equivalent in favor of MySQL text search in all other use cases.

Switched to using OpenSearch:

Removing usage of Elasticsearch:

Because of this, I propose setting the acceptance date on this ticket for April 18, 2022 in order to give the community time to discuss this.

jristau1984 commented 2 years ago

To be clear about Blockstore: Our investigation found that Blockstore does not currently leverage ElasticSearch at all. Our recommendation is to have the BD-14 Content Lib v2 project team implement another solution during the project which would remove the need for OpenSearch.

jristau1984 commented 2 years ago

PT-CscommentsserviceESusage-170322-1900.pdf PT-TNL-9545-ElasticsearchUsageandReplacement-170322-1900.pdf

These are the discoveries done by T&L and Infinity squads which led us to prefer an OpenSearch solution rather than a native MySQL solution.

feanil commented 2 years ago

@jristau1984 looking at the discoveries, it looks like courseware search of course content is not actually enabled in the new MFE. Is the reason for moving to opensearch that we're planning to port that feature to the MFE in the near future?

jristau1984 commented 2 years ago

Yes, the plan is to re-implement this feature in the MFE versions of LMS and CMS when possible.

ormsbee commented 2 years ago

@dianakhuang, @feanil: Can this be moved to "Communicated" status, since there is a post about it?

CodeWithEmad commented 2 years ago

@feanil since we know what are the exact index names,

PT-CscommentsserviceESusage-170322-1900.pdf PT-TNL-9545-ElasticsearchUsageandReplacement-170322-1900.pdf

could we modify them to be configurable? like an environment variable with the default value of the index name (or add prefix/suffix to the index name). I'm asking this because we talked here about Using one Elastic Cluster for different organizations.

feanil commented 2 years ago

@CodeWithEmad I'm not sure exactly what you're asking? I believe that it should be possible to update the code to make the index names configurable in a safe way. Are you asking how you should go about modifying the code to be able to make this configurable?(Most Open edX services are django and have an associated settings file so I would push for making the name be pulled from a Django setting rather than an environment variable for consistency with the rest of the system.)

feanil commented 2 years ago

@dianakhuang I'm gonna assign this ticket to you as the point person for this work that 2U is taking on.

jmbowman commented 1 year ago

Arbi-BOM plans to take on some planning and coordination work for this very soon. For the benefit of them and any other 2U folk helping with this deprecation, here are some relevant internal resources:

I hope we can make most of the info in those docs public in the near future, but for now I just want to get the information linked so we can quickly unblock work on determining if there's anything useful we can do on this in time for the Olive release.

UsamaSadiq commented 1 year ago

Created a Draft discussion document to discuss plan of action to lead the effort on this task. Once the plan of action has been finalised, subsequent issues to track the progress will be created and it will be shared publicly with other community members.

feanil commented 1 year ago

@UsamaSadiq why make the discussion about the plan of action internal? I think these decisions will impact a lot of people in the community and would benefit from being had in the open. Is there a specific concern that led you to making the discussion internal?

UsamaSadiq commented 1 year ago

Hi @feanil, there is no particular reason. I was just taking it incrementally. I shared the document with 2U team first so we could do a final iteration/review before sharing with community. I'll go ahead and update the permissions of the document to make it accessible to everyone around the community. Also, I'm soon going to make subsequent issues which will make everything visible to the community as well.

feanil commented 1 year ago

Thanks @UsamaSadiq I think for such a big decision, it's good to share not only the final decision with the community but all the intermediate steps that led to the decision. Thanks for opening up the working docs.

feanil commented 1 year ago

@UsamaSadiq what do you think about writing ADRs for the decision for each repo so that we can share it out with the community? Since the decisions are different for the different projects, it would be good to capture the reasoning for each in the relevant repo.

UsamaSadiq commented 1 year ago

Following is the current plan of action suggested by arbi-bom team to progress on this issue:

CC: @jmbowman @feanil

UsamaSadiq commented 1 year ago

Created issues on the Maintenance boards and notified the owning teams in their slack channels.

feanil commented 1 year ago

@UsamaSadiq my concern is that notifying the "owning" team at 2U does not inform the community of users or CCs for the repos, I'd like the communication plan to include those groups, what's the best way to include those here? I don't think it means that we have to block on feedback on those groups but I'd like them to be informed as we progress through the process. Most are not following projects in the edx org.

UsamaSadiq commented 1 year ago

@feanil I've shared above mentioned issues with the owning teams. Each team will be creating an ADR document after finalising their findings and share it with community. Meanwhile, you can either let me know if I need to share the issues linked above in some particular openedx channel to make these more visible to the community or I could announce these issues to the community once we have initial ADR documents prepared by the owning teams. I believe community will have access to the above linked issues so they'll be able to add their inputs on the issues. On 2u side, I'll keep on updating these issues with any update from the owning teams' side.

I hope this works out as you are expecting. If you have any other idea which could help us in increasing collaboration, I'm all ears to it.

UsamaSadiq commented 1 year ago

Adding on to my point, we could probably create ADR documents in the openedx confluence and ask the 2U teams to add update there so it'll also be visible to the community and make the collaboration easier.

feanil commented 1 year ago

I think creating the drafts in the Open edX Confluence or as PRs on the repos(even in draft form) would both be great.

I think this ticket is a great place to provide future updates, but for major changes or milestones, I would also mention them on https://discuss.openedx.org/t/deprecation-removal-depr-170-move-from-elasticsearch-to-opensearch/5844/10

jmbowman commented 1 year ago

Adding a note I wrote in a Slack conversation regarding a point that complicates the migration for course-discovery and edx-notes-api (I think these are the only repos that currently use django-elasticsearch-dsl):

Regarding OpenSearch libraries: there's https://github.com/opensearch-project/opensearch-py for basic Python support, but there's still only experimental, not-production-ready forks of https://github.com/django-es/django-elasticsearch-dsl and https://github.com/barseghyanartur/django-elasticsearch-dsl-drf (the former refused to add support for OpenSearch out of concern for API drift over time). The latest update is in https://github.com/barseghyanartur/django-elasticsearch-dsl-drf/issues/271#issuecomment-1368124708 (the mentioned forks have had no commits since that comment was made in December).

dianakhuang commented 10 months ago

AXIM is going to take over maintainership of the edx-notes-api repo, and will try to do this migration.

feanil commented 9 months ago

Open Questions

jmbowman commented 9 months ago

Unfortunately, it looks like my comment from August still stands. There have been a couple of forks of Django's Elasticsearch packages to add/substitute OpenSearch, but they haven't seen any real activity since they were created last year. I suspect if we use them, we'll have to take over maintenance of them.

dianakhuang commented 7 months ago

Note: There were performance issues in the past with MySQL full text search and performing any other queries. We would like to make sure this is no longer the case before we implement it in our services.

bradenmacdonald commented 6 months ago

Hi folks, have there been any updates on OpenSearch/ElasticSearch/etc? Is there any current work happening?

My current understanding is:

Note: I heavily updated this comment from the original version after further research ^

jmbowman commented 6 months ago

My info is about a month out of date now, but some historical context and opinions (Feanil has already heard most/all of this):

I'm unfortunately not likely to be able to help much with this for a while, so it's going to be up to other people to pick a path forward. I just wanted to articulate that while OpenSearch looks at first like the easiest/safest path forward to solve the licensing problem, it's actually harder than it looks and may not really set up Open edX for success in future search improvements. I tried repeatedly over 3 years to build momentum on solving the Elasticsearch licensing issue, but it was hard to get anybody excited about the switch to OpenSearch (especially with 2U not feeling the pain because Amazon still hosts the old pre-license-change Elasticsearch version with security patches).

bradenmacdonald commented 6 months ago

Thanks a lot @jmbowman, that's very helpful.

ormsbee commented 6 months ago

@feanil, @dianakhuang: Has Meilisearch been discussed/evaluated at any point in the ES replacement talks? I don't see any conversations on it in the wiki or Discourse. It sounds really compelling, particularly the part where it uses vastly less memory (a 5-10X difference from what I've seen of various people's blog posts).

dianakhuang commented 6 months ago

I know @jmbowman has advocated for it, but we haven't done any discovery on it.

bradenmacdonald commented 6 months ago

Meilisearch sounds like an ideal option to me too. And I like that it supports multitenancy, which can really bring down costs for orgs that host lots of small Open edX instances, e.g. sandboxes.

jmbowman commented 6 months ago

It's mostly been brought up in Slack threads and verbal conversations (mostly in the 2U internal workspace, although there are passing mentions here and here). In early conversations a couple of years ago it was still new/unproven enough that I wasn't confident promoting it as a serious alternative (didn't want to be the "rewrite it in Rust" fanboy), and there hasn't been much real discovery work done on this since then. The migration off Elasticsearch kept coming up in conversations, but those conversations usually ended with "well, it isn't a priority for 2U because it has the AWS-supported old Elasticsearch option, and nobody else in the community seems willing yet to commit resources to it or even answer how high of a priority it is for them." I do think Meilisearch has proven itself enough now that it should be seriously considered as an option, especially given the proven demand for Algolia-like functionality that isn't really covered by either Elasticsearch or OpenSearch.

jristau1984 commented 6 months ago

The internal discussions I remember around getting off of ElasticSearch also mostly landed on "get off of the need for ES entirely, not just migrate to OpenSearch". Most of those came to fruition, I believe, with Discussions as a key item remaining in ES.

ormsbee commented 6 months ago

I made a forum post on the topic of whether we should consider Meilisearch as a potential alternative to OpenSearch.

feanil commented 3 months ago

Update, we'll be trying out Meilisearch for the new content library search and if we like we will choose it as the new target for all the existing search functionality. This determination will be made before Sumac is cut.