laravel / ideas

Issues board used for Laravel internals discussions.
938 stars 31 forks source link

A `chunkByCursor()` method #2636

Open halaei opened 3 years ago

halaei commented 3 years ago

The new cursorPaginate() is an amazing feature. I was wondering if we can have a corresponding feature for chunking items. It seems such a feature is missing. Relative existing features are:

  1. chunkById(): it only paginates on a single column and is always ascending, so it is not as flexible as cursorPaginate().
  2. cursor(): it uses PDOStatement::fetch(). I am not sure how does it work on DBMS side especially for large tables. Does it have an initial delay to execute the query to store and and keep all the results on a temporary table for as long as PHP iterate on them? Does it make a very long-lasting transaction until all the data is fetched? If so, it may not be a great choice for large tables. Another drawback is that it doesn't support eager loading of relations.
tpetry commented 3 years ago

Why do you want to chunk on complex sorting criteria? Chunking by e.g. the primary key should be enough for all use cases?

halaei commented 3 years ago

chunkById() is enough for most of the use-cases, but not all. I don't recall all the cases I needed such a feature, but I am sure I wanted it from time to time. For example, sometimes I prefer to chunkById() but in reverse order, because the newer records are more valuable to be processed. That can be when I am indexing all database records in some search engine, because the data is lost in search engine, or when I want to send notifications to the users, but I am not sure what percent of the users should receive notification and I want the active users to receive notification sooner. Or maybe I want to process records on a table that doesn't have a single-column key. Or maybe the operation can be faster/easier if I sort the records by something other than a single-column key, e.g. by user id.

Having said that, I would like this feature in the framework only if it can be implemented by the minimum lines of code and maintenance effort.

halaei commented 3 years ago

I think letting chunkById() accept a clusure as its $id attribute might be a solution. The closure will be responsible for ordering and adding constraints so that all the records get fetched and processed once with the desired order.

Example: the following will do as the current chunkById() but with reverse order:

Notification::query()
    ->chunkById(100, function (Collection $notifications) {
        // process the chunk
    }, function (Builder $query, ?Notification $last) {
        // Sort and paginate the records
        $query->orderByDesc('id');
        if ($last) {
            $query->where('id', '<', $last->id);
        }
    });