ruflin / Elastica

Elastica is a PHP client for elasticsearch
http://elastica.io/
MIT License
2.26k stars 733 forks source link

(WIP) Upgrade elasticsearch php dependency to 8.8 #2168

Closed coreation closed 5 months ago

coreation commented 1 year ago

Tasklist

Analysis tasks

Refactor tasks

Docker & Tests

Generics

Specifics

RFC

coreation commented 1 year ago

@ruflin @thePanz since you are the 2 people I've interacted with so far, I'll just post my questions here. From my point of view, this is the only elasticsearch-php upgrade PR. I've seen some back and forth with regards to this upgrade on the discussion pages, but perhaps it's best to bundle everything here from now.

I wanted to get started, but immediately hit a first barrier. The operations with regards to indices have been rewritten. Instead of having a specific Create() class, everything regarding indices is now added to Indices class, where the create is refactored into a function, which then executes the operation.

As far as I see in the Elastica codebase, the aim is/was to create a Create() instance, which was an AbstractEndpoint class, and then calling the methods on that class to execute the creation. However, this is now all abstracted away inside the new Indices::create() function.

This makes me think that the requestEndpoint might need to be removed, and a whole bunch of code in Index.php reworked to use the high level Indices::XYZ() methods. This would require us to convert the Elastica client configuration to a elasticsearch-php ClientInterface object.

These kind of decisions I'd much rather talk through, then just go ahead and spent time on because I'm by far no expert in the approach this library takes, but happy to learn.

ruflin commented 1 year ago

@coreation Have not forgotten you on this one. I have not found yet the time to dig into the elasticsearch-php code. Ideally, I would like to keep the basic Elastica construct and change things under the hood. Removing requestEndpoint would be quite fundamental. I think that is where recently the discussion started, to potentially remove dependency on elasticsearch-php which I'm not a big fan of.

Are the changes on elasticsearch-php only for indices or all over the place? I'm asking because requestEndpoint exists in several places so I try to get my head around how big the change would be.

@coreation What if we don't convert the Elastic client to ClientInterface? Would it mean we require additional boiler plate around the client? I know you mentioned not to start coding at the same time, I feel like having a small sample PR on what the change would mean for one class (no tests passing) could help the conversation.

coreation commented 1 year ago

@ruflin I'll see what I can do in terms of an example. Going by the unittest output, most of the test didn't make it much further than initiating the index, so it might be that fixing that part of the codebase might actually by 90% of the work, but that's hopeful thinking.

I like the idea of making an example and discuss things further based on that, I'll see what I can do this week.

oleg-andreyev commented 1 year ago

@coreation @ruflin if you need any help with this - just tell.

coreation commented 1 year ago

@oleg-andreyev thanks for the heads up, I think we're going to be needing some extra hands here. @ruflin @oleg-andreyev if you look at the elastic-search 8 branch endpoints compared with elastic-search 7 branch endpoints, it shows that the Endpoints have undergone a serious transformation. There could be other parts of the library that have been addressed the same way, I haven't looked into that just yet.

The transformation goes from being Endpoints being a namespace with different classes representing different operations in the 7.x branch to Endpoints becoming dedicated classes with the ElasticSearch operations embedded in those classes as functions.

Example: \ElasticSearch\Endpoints\Indices\Create is now \ElasticSearch\Endpoints\Indices::create

With that, they also bring an ElasticSearch class to the table, which is an HTTP response wrapper around ElasticSearch Psr7 HTTP responses.

What you'll see in my very short refactor example in the index.php file, is that we'll likely have to refactor these kind of classes (i.e. Index) on 2 parts: business logic within the methods and transforming the ElasticSearch response into an Elastica Response. I think the latter makes sense in order to make migrating to the new Elastica version doable.

@ruflin is that enough of an example? Note that I'm not looking at good code practises, just a rough example of how things need to change.

oleg-andreyev commented 1 year ago

@coreation what I see is that all endpoints are autogenerated by some script that I cannot find in the repository (probably private), depending on how we wanna manage dependency with https://github.com/elastic/elasticsearch-php, relaxed or strict, if strict that means we follow their semantic (and php version) and we don't need to think about any workaround/wrappers and etc. if we want to allow to relax versions (supporting both 7.x and 8.x), then we need to create some wrapper for each endpoint and based on Client::VERSION constant decide what to do.

@ruflin what's your thoughts on this?

To keep it simple, I'd follow elasticsearch-php as is, but maybe I'm missing something.

ruflin commented 1 year ago

@coreation I like the direction you are taking here. It means in many cases, users can upgrade to Elastica 8.0 without having to worry about the underlying changes (in most cases).

On the support of 7 and 8: I would keep it simple for now and only support 8. If there are many requests (and contributions) we can always introduce this later. +1 on following elasticsearch-php.

Sorry for the late reply again on my end.

coreation commented 1 year ago

Hey @ruflin , just to summarize to make sure I understand you well:

So moving this forward, does that mean the approach based on the example I gave would be approved? If so, then we can started and hopefully get some people involved to code that up. \cc @oleg-andreyev ;) If someone is working on it, given the big amount of work ahead of us, it would be good to give each other a heads-up, so that we don't have people working on the same thing. @ruflin since you build most of this library, any recommendations for next steps / organising this?

ruflin commented 1 year ago

Glad you asked back @coreation, we need to dive a bit deeper here. Lets first get PHP / Elasticsearch version out of the way. I was referring to the following:

if we want to allow to relax versions (supporting both 7.x and 8.x), then we need to create some wrapper for each endpoint and based on Client::VERSION constant decide what to do.

My assumption is @oleg-andreyev talks here about the Elasticsearch version, not PHP? At least on my end, I would focus on supporting Elasticsearch 8.x which I think is also what elasticsearch-php does. With it, we indirectly likely align also on the PHP version.

Elastica library receive the same type of response,

The same type of response we have today, not the same type of response that the elasticsearch-php library has I assume? This is indicated by line https://github.com/ruflin/Elastica/pull/2168/files#diff-858b65fbfb449bff73db68734d544fb5a4849e4bc8b986fad5896c202813641bR517 Do you forsee we can use some generic method for this or you expect it a lot of manual work?

The reason I like the above approach because one of the powers of Elastica is that you can chain calls together to build queries etc. If we move over to the elasticsearch-php response format, I think we partially loose this capability. At the same time I'm concerned it might create lots of additional code paths. It brings back the question: Would it be easier to just rip out elasticsearch-php again ...

@coreation For the split of the work, I hope we can find ways to do it in many small chunks. Have lots of small PR's that can go in in parallel. As not too many people work on the project, I'm not too worried about multiple people working at the same. What is not clear to me is if we can easily split up the work in many small PR's as the dependency update of the lib is global :-(

coreation commented 1 year ago

Hey @ruflin

Glad you asked back @coreation, we need to dive a bit deeper here. Lets first get PHP / Elasticsearch version out of the way. I was referring to the following:

if we want to allow to relax versions (supporting both 7.x and 8.x), then we need to create some wrapper for each endpoint and based on Client::VERSION constant decide what to do.

My assumption is @oleg-andreyev talks here about the Elasticsearch version, not PHP? At least on my end, I would focus on supporting Elasticsearch 8.x which I think is also what elasticsearch-php does. With it, we indirectly likely align also on the PHP version.

Clear!

Elastica library receive the same type of response, The same type of response we have today, not the same type of response that the elasticsearch-php library has I assume? This is indicated by line https://github.com/ruflin/Elastica/pull/2168/files#diff-858b65fbfb449bff73db68734d544fb5a4849e4bc8b986fad5896c202813641bR517 Do you forsee we can use some generic method for this or you expect it a lot of manual work?

Indeed, so that users of the library don't need an upgrade path to adhere to some new type of Response object. I assume the transformation function is going to be a very generic one.

The reason I like the above approach because one of the powers of Elastica is that you can chain calls together to build queries etc. If we move over to the elasticsearch-php response format, I think we partially loose this capability. At the same time I'm concerned it might create lots of additional code paths. It brings back the question: Would it be easier to just rip out elasticsearch-php again ...

This is very fundamental and I don't have that much insight into the earlier work that was done. If there's no massive amount of work needed for every elasticsearch-php update, then I would keep it. I think this is an odd one out, since they reworked basic functionality from a more "class per ElasticSearch operation" to "class per bundle of ElasticSearch operations". Which is now affecting the Elastica library.

@coreation For the split of the work, I hope we can find ways to do it in many small chunks. Have lots of small PR's that can go in in parallel. As not too many people work on the project, I'm not too worried about multiple people working at the same. What is not clear to me is if we can easily split up the work in many small PR's as the dependency update of the lib is global :-(

I think splitting it up would require first some "basic" things to be built in place. One of them is that we make sure return types remain the same. The example I gave with the transformation of response objects is - I hope - one of the few we'll need.

Once we got that in place, it's just working in the new way of elasticsearch-php 's classes like Index::create() instead of new Create().

To split things up, I think someone has to go through some classes that are affected, i.e. Index, and just make a kind of billboard ticket listing all the affected classes? That way people can contribute in small chunks in an organised manner?

ruflin commented 1 year ago

I went a bit more through the code base and I think ripping out elasticsearch-php would actually be more work then doing the transition. My hope is that most of the transition will look very similar. Also I still think Elastica benefits to rely on elasticsearch-php to get all the underlying improvements that are made to it.

To split things up, I think someone has to go through some classes that are affected, i.e. Index, and just make a kind of billboard ticket listing all the affected classes? That way people can contribute in small chunks in an organised manner?

++. We can use the new tasks features in Github issues for this. The part I'm trying to get my head around: How can we do it in iterations? As soon as we upgrade the dependency everything breaks and all has to be done at once? Can we have somehow the two dependencies coexist for a while? Ideas on how we should deal with this?

coreation commented 1 year ago

@ruflin I agree, I would for sure keep the elasticsearch-php as a base and just do what this library is good at, providing a neat handy wrapper around it.

As far as tasks go, I think the iterations will likely be aspect based. Meaning, getting the Index class "green", meaning, all errors coming from not being able to create an endpoint (cfr. unittests) must no longer appear and are fixed because of the refactor work done using the new elasticsearch-php library for example.

In this case, I don't think it's doable of keeping the two dependencies in there, I think we're better off just with a clean refactor, putting a new major release version on it. I haven't done any analysis, but there's some gut feeling telling me that the majority of the work will be refactoring the deprecated classes + adding a response transformation function will be 80% of the work. But that's a very uneducated guess, more of a 🤞

ruflin commented 1 year ago

@coreation One suggestion to move this forward. Lets start having a Tasklist in the description of this PR and start using this PR for all the work (kind of a feature branch). All the contributions would go with a PR against this branch. As soon as we get to a green state, we can get it into 8.x and start more iterative from there.

coreation commented 1 year ago

@ruflin good idea, I've updated my PR comment, feel free to update where applicable. Might be a good time to get @oleg-andreyev back into the conversation, time to get our hands dirty ;)

coreation commented 1 year ago

@ruflin I've updated the original post and I think apart from the approach, I think we can have a first effective discussion to move the actual code forward. My suggestion in the RFC section is that we either remove the src/Client.php or make it compatible with elasticsearch-php since the new Endpoint classes like "Indices" need an elasticsearch-php Client object.

My aim for my this commit is to simply have the "testCreateNewIndex" test working. That involves:

What I've found is that:

I think our first focus should be on deciding what to do with the src/Client and how we can get that to work with the Endpoint classes of the updated elasticsearch-php classes. Once we have that, we can build up from there!

Please excuse the messy try-out code, this is really to just get something going as I discover the hoops to jump through :D.

ruflin commented 1 year ago

Replace the src/Client.php usage with the elasticsearch-php Client class. RFC: @ruflin - seems like the elasticsearch-php Client pretty much does everything the src/Client.php does. I'm not sure though how vital this was for other users to be using this class naked sort of speak. If the answer is: a lot. Then I suggest we rework the src/Client so that it uses the elasticsearch-php Client under the hood. Makes for an easier phasing out later on.

I don't have data on this but I would assume, the Client object is used heavily. Every time Elastica did not support a specific call, one of the recommendations is to use the raw request method: https://github.com/ruflin/Elastica/blob/8.x/src/Client.php#L538 If possible, I would keep the Client.

@coreation I'll be out of most of July but please don't hold back on progressing with this one. @deguif @thePanz @franmomu Would be great if you could get involved in this too.

coreation commented 1 year ago

I don't have data on this but I would assume, the Client object is used heavily. Every time Elastica did not support a specific call, one of the recommendations is to use the raw request method: https://github.com/ruflin/Elastica/blob/8.x/src/Client.php#L538 If possible, I would keep the Client.

I absolutely agree, I thought this was the way to go as well, but I wanted to avoid injecting bias into the question. So the approach of getting an Elasticsearch Client based on the Elastica Client is an ok approach then I believe.

@coreation I'll be out of most of July but please don't hold back on progressing with this one. @deguif @thePanz @franmomu Would be great if you could get involved in this too.

No problem, enjoy your time OoO!

franmomu commented 1 year ago

Some time ago I tried to start this and ended up having a ElasticSearch\Client client inside of Elastica\Client and transforming the response, I wanted to start making this work and then see how this could be improved, but something else came up and I didn't continue.

Just in case is of any help (some of the code is not tested/finished):

https://github.com/ruflin/Elastica/compare/8.x...franmomu:Elastica:elasticsearch_8?expand=1

VincentLanglet commented 11 months ago

Thanks for trying to move on this migration @coreation. How far are you in the first analysis task ?

RFC: Replace the src/Client.php usage

With Elastic 7 being EOL in few days elastic.co/support/eol, shouldn't new RFC be delayed in order to release the v8 compatibility as soon as possible ?

coreation commented 11 months ago

hey @VincentLanglet I haven't had the time to start yet, apart from what I've already listed in the todo task list. I'm currently on vacation for another 2 weeks. After that I'm ready to pick up where I left off, but work is very demanding at this time unfortunately.

oleg-andreyev commented 11 months ago

@coreation @ruflin

Here is my example for delete method:


    /**
     * Deletes the index.
     *
     * @throws ClientException
     * @throws ConnectionException
     * @throws ResponseException
     */
    public function delete(): Response
    {
        $response = (new Indices($this->getClient()->getElasticsearchClient()))->delete(['index' => $this->getName()]);
        return $this->covertToElasticaResponse($response);
    }

what I don't like is copying: $this->getClient()->getElasticsearchClient()

also had to implement helper method:

    private function covertToElasticaResponse(ElasticsearchResponse $response)
    {
        return new \Elastica\Response(
            (string) $response->getBody(),
            $response->getStatusCode(),
        );
    }

speaking about response, probably \Elastica\Response wrap \Elastic\Elasticsearch\Response\Elasticsearch and provide all B/C layers within in response.

ruflin commented 11 months ago

With Elastic 7 being EOL in few days

@VincentLanglet I don't think this is the case as 9.0 is not out yet: https://www.elastic.co/support/eol

what I don't like is copying: $this->getClient()->getElasticsearchClient()

@oleg-andreyev What is your preference here? Should we fetch the client only once and then store it locally. For example having $this->getElasticsearchClient() method that wraps the above? What is the main concern?

covertToElasticaResponse

If you compare the response object before and after, how close are we to backward compatibility? Is all the info we need in the ElasticsearchResponse response?

VincentLanglet commented 11 months ago

With Elastic 7 being EOL in few days

@VincentLanglet I don't think this is the case as 9.0 is not out yet: elastic.co/support/eol

My bad, it's indeed the 1rst august OR when 9.0.0 is released.

oleg-andreyev commented 11 months ago

With Elastic 7 being EOL in few days

@VincentLanglet I don't think this is the case as 9.0 is not out yet: elastic.co/support/eol

what I don't like is copying: $this->getClient()->getElasticsearchClient()

@oleg-andreyev What is your preference here? Should we fetch the client only once and then store it locally. For example having $this->getElasticsearchClient() method that wraps the above? What is the main concern?

covertToElasticaResponse

If you compare the response object before and after, how close are we to backward compatibility? Is all the info we need in the ElasticsearchResponse response?

Speaking about client, it would be great to have single client that is using HttpInterface (so it could be Symfony Client, Guzzle and etc) and yes, it would be better just to call $this->getClient()

Speaking about response, need to compare but they are not 1:1 - 100%

coreation commented 9 months ago

Hi everyone, as I look towards EOY, I'm much too occupied with my own company in order to work on this in a decent way. There's so much that needs to be done, and I don't see any way I can make time available to work on this. I'm very sorry and hopefully someone finds the preparation done so far useful enough to continue.

ruflin commented 9 months ago

@coreation Appreciate the heads up. It seems we all are a bit in the same boat. Lets see who can pick it up and drive to completion.

ruflin commented 5 months ago

With https://github.com/ruflin/Elastica/pull/2181 merged, should we close this PR and continue the work on top of 8.x.

coreation commented 5 months ago

Definitely!