Open prodigy7 opened 4 years ago
I agree with 1): unless you read the implementation, it's not clear that the implementation is naive and doesn't take advantage of any atomicity of the database.
Not sure about 2); that looks more like a "beginners" [1] problem. The programmatic interface provided by Laravel is based on PSR-6 (not that it matters a lot though…) and would like to believe that developers are aware of such things.
Here's another one though; similar in spirit but not directly related to "high load situations":
Laravel has multiple methods dealing with "working with larger SQL datasets", but only one is really suited for this.
\Illuminate\Database\Eloquent\Builder::cursor
I think the philosophy of the name comes from a "database cursor", but it currently does not use one. The only thing this method does it "on demand emitting models" but the underlying SQL query is still a) run once and b) the complete result fetched into memory.
So, whilst this may work for 100k entries, it probably won't for 10Mio, depending on the RAM 😏 , which it actually shouldn't.
Granted, the documentation mentions this detail but I think it's still easy to overlook => https://laravel.com/docs/8.x/collections#lazy-collection-introduction
This allows you to still only run a single query against the database but also only keep one Eloquent model loaded in memory at a time.
It's simply still away from what I would expect a "cursor" would achieve technically.
\Illuminate\Database\Concerns\BuildsQueries::chunk
It's implemented as LIMIT/OFFSET which is basically the performance death of everything iterating over lots of elements, as RDBMs cannot optimize for this scenario.
The documentation doesn't state this explicit, the best we've is https://laravel.com/docs/8.x/queries#chunking-results
If you need to work with thousands of database records, consider using the chunk method
"Thousands" is a very far cry from e.g. 10Mio (my above hypothetical example) but I'll bet with offset/limit you'll see performance degradation will lower numbers too.
\Illuminate\Database\Concerns\BuildsQueries::chunkById
This is one currently best suited for processing really large results, as it uses WHERE primary_key > …
which can make efficient use of indices (unless you forgot one 😅); the docs however don't highlight this enough IMHO => https://laravel.com/docs/8.x/queries#chunking-results
If you are updating database records while chunking results, your chunk results could change in unexpected ways. So, when updating records while chunking, it is always best to use the chunkById method instead
So it's more alluring to the stability of the processed records, but the using it for performance is equally important.
Real cursors are not supported out of the box, unless I truly missed it.
[1] apologies for the lack of better description, I do not mean to offend anyone here
Hey mfn :) To 2) I don't take this as an offence, it might actually be a beginner problem if you assume that you have dealt with all or most PSR. Admittedly I did not deal with the PSR, this happens to me at work when I encounter concrete scenarios. I think at this point it's more of a "nice to have" if you clearly mention in the documentation that there might be problems. I think code that intercepts the described scenario is not really possible, you have to treat it individually as described by me.
Thanks in any case for your detailed answer, I think it's good and maybe we'll uncover more pitfalls, where you can either do something better in code, optimize something in documentation or what is simply due to necessary knowledge!
Why doesn't Laravel consider to implement Swoole into Laravel core framework? https://github.com/laravel/ideas/issues/1452
You should try the effort https://github.com/swooletw/laravel-swoole and see the difference. Imagine if you could integrate it into Laravel's core.
@rhzs I think, it is not directly related to my concern. I talk about optimize existing Code and documentation, not about integrate packages. All important I think is discussed in the thread you post
I don't believe Laravel's source code needs to be "optimised" for high load. It's just a case of awareness of how to solve such problems using Laravel.
1) Laravel's firstOrCreate
was never meant to be atomic. If you want atomicity, you could use a database lock, cache lock, add a job middleware to prevent overlapping jobs for the same record, etc. All these methods are fully supported by Laravel. Rails has similar firstOrCreate
methods with similar logic. The logic isn't wrong. Assuming that they work automatically for all situations is wrong.
2) Again Cache::has
and Cache::get
back-to-back calls aren't meant to be atomic. There are again ways within Laravel with which these can be solved (e.g. use cache locks).
You just need to figure out which method is best for you. It's more than likely that Laravel has at least one way for your specific use case.
In the past few days we have tested a system developed in Laravel with a high workload. In doing so, we have stumbled across two situations that show that there are things to consider when developing a system that is designed for high load.
Situation 1: firstOrCreate In a particular situation, an API call is called several times in a row at very short intervals. A model call is stored in the respective controller, in which a data record is to be created with firstOrCreate, if it does not yet exist. At this point, we were able to determine that the error that the data record should be inserted duplicated occurred more often. Our assessment was that this was very unfavourable timing with regard to the database transactions. If you look at https://freek.dev/1087-breaking-laravels-firstorcreate-using-race-conditions you can see that this is also true. It would be good if the workaround described in the thread could be implemented in Laravel in some way. For example, harden firstOrCreate or other calls or provide new methods.
Situation 2: Working with the cache If you try to write clean code as a developer and read the documentation for working with the cache, you feel right here: First check with
::has
if the key you need is in the cache and if yes, then check with::get
. The only problem is that it can come to unfavourable timing. We often had the case with a well-equipped Redis database in the background that a key was still available at::has
, but then it was no longer available at::get
. If you then build an if query, this of course leads to very unpleasant situations, because you assumed that data is available. At this point it would be good to point out such circumstances in the documentation. Our "workaround" now is to always fetch a value with ::get and then simply check if the value isNULL
. This way we have a snapshot and can work with something instead of having an unreliable situation.I can imagine that there could be a few more situations. Perhaps this can be documented here?