pdphilip / laravel-elasticsearch

Laravel Elasticsearch: An Elasticsearch implementation of Laravel's Eloquent ORM
MIT License
86 stars 16 forks source link

Refract/cleanup - Added Test suite / formating etc. #32

Closed use-the-fork closed 1 week ago

use-the-fork commented 1 month ago

This commit introduces several new features and improvements to the package:

There is definitely work to be done with coverage but in general this is a solid foundation: image

use-the-fork commented 1 month ago

@pdphilip , I just wanted to keep you in the loop on progress. I have the foundation all set, and all models and factories transferred and working.

I am working on what I see as the critical tests.

My only ask so far is if you can look at the PHP-CS-Fixer setup and let me know if you're good with it.

pdphilip commented 1 month ago

@use-the-fork this is looking fantastic, very excited to have the tests built in. Amazing contribution

pdphilip commented 1 month ago

Great work @use-the-fork - I'll review this over the next couple of days, looking great

pdphilip commented 1 month ago

Need a bit more time @use-the-fork - excited about this one

use-the-fork commented 1 month ago

Need a bit more time @use-the-fork - excited about this one

No worries! I'm making some other changes in a separate branch and will raise PRs one by one.

use-the-fork commented 2 weeks ago

Need a bit more time @use-the-fork - excited about this one

Any update here? I have fixed a few more bugs and am keeping them in a separate branch until this PR is merged.

pdphilip commented 2 weeks ago

Hey @use-the-fork - you can merge them in this same PR so long (and merge with latest main as well). Then I'll publish them all in one release rather.

There's a few things on the go my end as I'm building another package around this one. Hence shortage on time right now.

use-the-fork commented 2 weeks ago

@pdphilip All set!

pdphilip commented 2 weeks ago

@gregory-lifhits-assaabloy, I see you snuck in a new feature. Please give me more information about what you're trying to do by:

I saw your test but I'll need to document this ultimately. Thanks

use-the-fork commented 2 weeks ago

@gregory-lifhits-assaabloy, I see you snuck in a new feature. Please give me more information about what you're trying to do by:

  • What problem you're trying to solve
  • Linking documentation to ES docs
  • Example of how you envision this working with eloquent (code examples)

I saw your test but I'll need to document this ultimately. Thanks

Sorry, that's my work account 😄

Regardless you need this info:

What problem you're trying to solve Linking documentation to ES docs

The pagination works great but only up to 10,000 records. While yes it's an option to increase this count Via the elastic settings it's not recommended. Especially when an index may have millions of records.

The Old way to do this would be to use the deprecated _scroll API. The new way is to include sorts and use the search_after param See here: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after

The code additions give users the ability to scroll over all records without having to worry about the 10k limit. I implemented a custom cursor method that will pass the sort fields and allow the user to paginate via API or use it straight out in there code as seen in the tests.

Example of how you envision this working with eloquent (code examples)

So as an example, if a user wanted to paginate Via API they would do:

  $items = Item::where('business_unit', $brand->business_unit)
      ->orderBy('item_number')
      ->orderBy('created_at');

//Pass to a resource response from here
$items = $items->searchAfterPaginate(20)->withQueryString();

that will cause there to be a next_cursor and prev_cursor key that can be used to move back and forth.

Let me know if you need more explanation here.

pdphilip commented 2 weeks ago

Ok, I'll take it from here to

  1. Get prev_cursor and prev_page_url to work properly
  2. Get $cols to filter in the items
  3. Find a reliable way to check for the last page
use-the-fork commented 2 weeks ago

Ok, I'll take it from here to

  1. Get prev_cursor and prev_page_url to work properly
  2. Get $cols to filter in the items
  3. Find a reliable way to check for the last page

So I have to check but I think a lot of this is handled with the implementation unless you are talking about adding documentation.

The larger concern is sorting REALLY matters and it can't be a general field with a value that may be the same across entries. For example, ordering by status will give inconsistent results. That should be highlighted in the docs. To make things safe I started saving the ID in a keyword field that I can then use to sort to guarantee not getting the same record more than once.

pdphilip commented 2 weeks ago

Hey @use-the-fork

This is a great feature to add to the package

Your approach is great, but the implementation was incomplete.

These are the issues I found in testing it:

I made 100 products each with a queue field with a unique number (1-100)

Then I ordered desc on queue

Here's your commit prior to me reworking it: https://github.com/pdphilip/laravel-elasticsearch/pull/32/commits/94763ad004bd38e78ae03b8cff45d767cd689ad7

Refactor

I've gone with your direction, and gone further to take full control of cursor management, embedding a cursor state into the models.

Added features.

Embedded in the cursor is a TTL, I've made it 5 minutes. If 5mins lapses then it will recount and calculate total and last_page

Finally, in the spirit of the package inheriting base methods, I renamed it to cursorPaginate

Have a look and give it a test.

Cheers

use-the-fork commented 2 weeks ago

@pdphilip I am working through this now. Looking good. One item I wanted to talk about how is the inferSort you added. I tried this before but it doesn't always work. If a user sets the created_at time manually or the insert happens within the same second the sorting can get messed up. I understand this is an edge case but still.

use-the-fork commented 2 weeks ago

This is still wip almost done. I really needed to get bulk insert working to test pagination without having to wait a long time.

use-the-fork commented 2 weeks ago

@pdphilip Take a look everything you did looks great. I just cleaned it up a bit and modified the tests to work.

I also refactored the insert method to work with the Bulk API. As a result, inserting 10,000 records is much faster!

I also added Model tests that check most of the base model functionality so that we are sure nothing broke.

use-the-fork commented 2 weeks ago

@pdphilip What are you using for formatting and styling? I want to ensure the git hooks match it so you don't have to do cleanup work.

pdphilip commented 2 weeks ago

Pint, did add it to composer dev.

Tho I saw some weird formatting, but I think that was when I was switching branches and an outdated gitignore caused chaos.

Anyway. Pint good sir

use-the-fork commented 2 weeks ago

Pint, did add it to composer dev.

Tho I saw some weird formatting, but I think that was when I was switching branches and an outdated gitignore caused chaos.

Anyway. Pint good sir

Got it. I'm using PHP CS Fixer. I will switch that off and add Pint.

This is all looking really good I think. I get how the package works now So its getting easier to add features. I am also using tests from the MongoDB package and converting them to work with laravel-elasticsearch.

pdphilip commented 2 weeks ago

Super. Let's park any new features for now. I really want to ship this and use it as the base going forward. Great work

use-the-fork commented 2 weeks ago

Super. Let's park any new features for now. I really want to ship this and use it as the base going forward. Great work

Agreed! I will make a new branch on my repo for anything else I am doing and merge in as separate PR after this goes in.

pdphilip commented 1 week ago

Update:

I have made a few changes to the bulk method:

  1. I set chunk size to 1000 at a time (as done in PHP ES docs example), this feels safe for memory considerations

  2. The default is that it returns a summary array of what happened, ex:

    {
    "success": true,
    "timed_out": false,
    "took": 5082,
    "total": 100000,
    "created": 100000,
    "modified": 0,
    "failed": 0
    }

    (That's a real result by the way, amazing that it can create 100k records in 5sec)

If there are any errors, it will return the details in an error field:

{
"success": false,
"timed_out": false,
"took": 5055,
"total": 100000,
"created": 99619,
"modified": 0,
"failed": 381,
"error": [....], //<-errors in here
"errorMessage": "Bulk insert failed for some values"
}

It also differentiates between created/modified, ex:

$prods = Product::limit(100)->get();
return Product::insert($prods->toArray());
{
"success": true,
"timed_out": false,
"took": 7,
"total": 100,
"created": 0,
"modified": 100,
"failed": 0
}

insert has a second parameter if you want the [successful] data returned

$prods = Product::limit(100)->get();
$updatedProds = Product::insert($prods->toArray(),true);
return $updatedProds; //shows the data, not the summary

The reason that this is not the default is that at mass this will exhaust most memory settings

Another major upgrade is that get(), search() and insert() returns an ElasticCollection which has the metadata embedded in it. It works like a normal collection but you get access to the query meta

$prods = Product::limit(100)->get();
$updatedProds = Product::insert($prods->toArray(),true);
return $updatedProds->getQueryMetaAsArray();
$prods = Product::where('orders','>=',100)->get();
$prods->getDsl();
$prods->getQueryMetaAsArray();

Let's peg it here now, then finish up the last of the tests and get this shipped

use-the-fork commented 1 week ago

@pdphilip Yea, this all makes sense to me, and I thought about doing something similar. I think your approach works well. What other tests do we need before this can get merged in?

pdphilip commented 1 week ago

I guess just the toDo()s - I've tested all the changes with my existing Test Suite covering 207 tests, though those don't include the cursor and bulk tests. Anyway, I'll wrap up the current toDo() ones, can you do a test for bulk?

Then we go.

use-the-fork commented 1 week ago

@pdphilip Is there a reason you removed:

        if ($refresh) {
            $params['refresh'] = $refresh;
        }

With out it it never waits for a refresh so it ends up failing tests since it's still building the records. I am going to add it back in since that way we can add insertWithOutRefresh in the future.

pdphilip commented 1 week ago

Go for it

use-the-fork commented 1 week ago

Go for it

with that back in it's working perfect! I am going to add a test to make sure it's returning a ElasticCollection and after that I think you can ship.

use-the-fork commented 1 week ago

@pdphilip This is ready for you!

pdphilip commented 1 week ago

@use-the-fork your CI is excellent, tested it and works great. Well done.

I'm going to leave the todo tests for now as is and prepare this release (update docs etc) - that's always a bit of admin. Then we ship this beast.

This is an amazing contribution, thanks a bunch. And I hope you'll stick around for more.

pdphilip commented 1 week ago

Also, if you'd like to be in touch, shoot me an email (in composer)

use-the-fork commented 1 week ago

@use-the-fork your CI is excellent, tested it and works great. Well done.

I'm going to leave the todo tests for now as is and prepare this release (update docs etc) - that's always a bit of admin. Then we ship this beast.

This is an amazing contribution, thanks a bunch. And I hope you'll stick around for more.

Yeah, some of the TODO tests require code additions that will make this PR to be even larger. I was just leaving them as placeholders until this gets merged in, and I can create smaller, easier-to-review PRs.

The CI is also not done I needed it merged into the main so I can modify it and test it in my branch. Github is weird and if its not in the main branch I can't test any changes :(

I'm not going anywhere I use this package all the time easily one of the most useful packages I have ever found so I'm excited to contribute!