Closed use-the-fork closed 1 week ago
@pdphilip , I just wanted to keep you in the loop on progress. I have the foundation all set, and all models and factories transferred and working.
I am working on what I see as the critical tests.
My only ask so far is if you can look at the PHP-CS-Fixer setup and let me know if you're good with it.
@use-the-fork this is looking fantastic, very excited to have the tests built in. Amazing contribution
Great work @use-the-fork - I'll review this over the next couple of days, looking great
Need a bit more time @use-the-fork - excited about this one
Need a bit more time @use-the-fork - excited about this one
No worries! I'm making some other changes in a separate branch and will raise PRs one by one.
Need a bit more time @use-the-fork - excited about this one
Any update here? I have fixed a few more bugs and am keeping them in a separate branch until this PR is merged.
Hey @use-the-fork - you can merge them in this same PR so long (and merge with latest main as well). Then I'll publish them all in one release rather.
There's a few things on the go my end as I'm building another package around this one. Hence shortage on time right now.
@pdphilip All set!
@gregory-lifhits-assaabloy, I see you snuck in a new feature. Please give me more information about what you're trying to do by:
I saw your test but I'll need to document this ultimately. Thanks
@gregory-lifhits-assaabloy, I see you snuck in a new feature. Please give me more information about what you're trying to do by:
- What problem you're trying to solve
- Linking documentation to ES docs
- Example of how you envision this working with eloquent (code examples)
I saw your test but I'll need to document this ultimately. Thanks
Sorry, that's my work account 😄
Regardless you need this info:
What problem you're trying to solve Linking documentation to ES docs
The pagination works great but only up to 10,000 records. While yes it's an option to increase this count Via the elastic settings it's not recommended. Especially when an index may have millions of records.
The Old way to do this would be to use the deprecated _scroll API. The new way is to include sorts and use the search_after param See here: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after
The code additions give users the ability to scroll over all records without having to worry about the 10k limit. I implemented a custom cursor method that will pass the sort fields and allow the user to paginate via API or use it straight out in there code as seen in the tests.
Example of how you envision this working with eloquent (code examples)
So as an example, if a user wanted to paginate Via API they would do:
$items = Item::where('business_unit', $brand->business_unit)
->orderBy('item_number')
->orderBy('created_at');
//Pass to a resource response from here
$items = $items->searchAfterPaginate(20)->withQueryString();
that will cause there to be a next_cursor
and prev_cursor
key that can be used to move back and forth.
Let me know if you need more explanation here.
Ok, I'll take it from here to
prev_cursor
and prev_page_url
to work properly$cols
to filter in the itemsOk, I'll take it from here to
- Get
prev_cursor
andprev_page_url
to work properly- Get
$cols
to filter in the items- Find a reliable way to check for the last page
So I have to check but I think a lot of this is handled with the implementation unless you are talking about adding documentation.
The larger concern is sorting REALLY matters and it can't be a general field with a value that may be the same across entries. For example, ordering by status will give inconsistent results. That should be highlighted in the docs. To make things safe I started saving the ID in a keyword field that I can then use to sort to guarantee not getting the same record more than once.
Hey @use-the-fork
This is a great feature to add to the package
Your approach is great, but the implementation was incomplete.
These are the issues I found in testing it:
I made 100 products each with a queue field with a unique number (1-100)
Then I ordered desc on queue
prev
). It didn't go back to the page before and it didn't sort it properly. next
3 times - then you should only be able to go prev
3 times$cols
were not filtering Here's your commit prior to me reworking it: https://github.com/pdphilip/laravel-elasticsearch/pull/32/commits/94763ad004bd38e78ae03b8cff45d767cd689ad7
I've gone with your direction, and gone further to take full control of cursor management, embedding a cursor state into the models.
prev
cursor and links$column
filtering works as expected next
cursor and linksAdded features.
created_at
and updated_at
- If not, then throws the MissingOrderException (Keep in mind, when testing this creating 100 records all have just about the same created_at
so it's buggy)created_at/updated_at
as the tie-breakerEmbedded in the cursor is a TTL, I've made it 5 minutes. If 5mins lapses then it will recount and calculate total
and last_page
Finally, in the spirit of the package inheriting base methods, I renamed it to cursorPaginate
Have a look and give it a test.
Cheers
@pdphilip I am working through this now. Looking good. One item I wanted to talk about how is the inferSort
you added. I tried this before but it doesn't always work. If a user sets the created_at time manually or the insert happens within the same second the sorting can get messed up. I understand this is an edge case but still.
This is still wip almost done. I really needed to get bulk insert working to test pagination without having to wait a long time.
@pdphilip Take a look everything you did looks great. I just cleaned it up a bit and modified the tests to work.
I also refactored the insert method to work with the Bulk API. As a result, inserting 10,000 records is much faster!
I also added Model tests that check most of the base model functionality so that we are sure nothing broke.
@pdphilip What are you using for formatting and styling? I want to ensure the git hooks match it so you don't have to do cleanup work.
Pint, did add it to composer dev.
Tho I saw some weird formatting, but I think that was when I was switching branches and an outdated gitignore caused chaos.
Anyway. Pint good sir
Pint, did add it to composer dev.
Tho I saw some weird formatting, but I think that was when I was switching branches and an outdated gitignore caused chaos.
Anyway. Pint good sir
Got it. I'm using PHP CS Fixer. I will switch that off and add Pint.
This is all looking really good I think. I get how the package works now So its getting easier to add features. I am also using tests from the MongoDB package and converting them to work with laravel-elasticsearch.
Super. Let's park any new features for now. I really want to ship this and use it as the base going forward. Great work
Super. Let's park any new features for now. I really want to ship this and use it as the base going forward. Great work
Agreed! I will make a new branch on my repo for anything else I am doing and merge in as separate PR after this goes in.
Update:
I have made a few changes to the bulk method:
I set chunk size to 1000 at a time (as done in PHP ES docs example), this feels safe for memory considerations
The default is that it returns a summary array of what happened, ex:
{
"success": true,
"timed_out": false,
"took": 5082,
"total": 100000,
"created": 100000,
"modified": 0,
"failed": 0
}
(That's a real result by the way, amazing that it can create 100k records in 5sec)
If there are any errors, it will return the details in an error field:
{
"success": false,
"timed_out": false,
"took": 5055,
"total": 100000,
"created": 99619,
"modified": 0,
"failed": 381,
"error": [....], //<-errors in here
"errorMessage": "Bulk insert failed for some values"
}
It also differentiates between created/modified, ex:
$prods = Product::limit(100)->get();
return Product::insert($prods->toArray());
{
"success": true,
"timed_out": false,
"took": 7,
"total": 100,
"created": 0,
"modified": 100,
"failed": 0
}
insert has a second parameter if you want the [successful] data returned
$prods = Product::limit(100)->get();
$updatedProds = Product::insert($prods->toArray(),true);
return $updatedProds; //shows the data, not the summary
The reason that this is not the default is that at mass this will exhaust most memory settings
Another major upgrade is that get()
, search()
and insert()
returns an ElasticCollection
which has the metadata embedded in it. It works like a normal collection but you get access to the query meta
$prods = Product::limit(100)->get();
$updatedProds = Product::insert($prods->toArray(),true);
return $updatedProds->getQueryMetaAsArray();
$prods = Product::where('orders','>=',100)->get();
$prods->getDsl();
$prods->getQueryMetaAsArray();
Let's peg it here now, then finish up the last of the tests and get this shipped
@pdphilip Yea, this all makes sense to me, and I thought about doing something similar. I think your approach works well. What other tests do we need before this can get merged in?
I guess just the toDo()
s - I've tested all the changes with my existing Test Suite covering 207 tests, though those don't include the cursor and bulk tests. Anyway, I'll wrap up the current toDo()
ones, can you do a test for bulk
?
Then we go.
@pdphilip Is there a reason you removed:
if ($refresh) {
$params['refresh'] = $refresh;
}
With out it it never waits for a refresh so it ends up failing tests since it's still building the records. I am going to add it back in since that way we can add insertWithOutRefresh in the future.
Go for it
Go for it
with that back in it's working perfect! I am going to add a test to make sure it's returning a ElasticCollection and after that I think you can ship.
@pdphilip This is ready for you!
@use-the-fork your CI is excellent, tested it and works great. Well done.
I'm going to leave the todo
tests for now as is and prepare this release (update docs etc) - that's always a bit of admin. Then we ship this beast.
This is an amazing contribution, thanks a bunch. And I hope you'll stick around for more.
Also, if you'd like to be in touch, shoot me an email (in composer)
@use-the-fork your CI is excellent, tested it and works great. Well done.
I'm going to leave the
todo
tests for now as is and prepare this release (update docs etc) - that's always a bit of admin. Then we ship this beast.This is an amazing contribution, thanks a bunch. And I hope you'll stick around for more.
Yeah, some of the TODO tests require code additions that will make this PR to be even larger. I was just leaving them as placeholders until this gets merged in, and I can create smaller, easier-to-review PRs.
The CI is also not done I needed it merged into the main so I can modify it and test it in my branch. Github is weird and if its not in the main branch I can't test any changes :(
I'm not going anywhere I use this package all the time easily one of the most useful packages I have ever found so I'm excited to contribute!
This commit introduces several new features and improvements to the package:
There is definitely work to be done with coverage but in general this is a solid foundation: