Open tractorcow opened 6 years ago
Allowing for eventual consistency will give us more freedom to build flexible architectures.
Moving to a promise architecture might speed up performance of batch file operations, but is that the biggest problem you're trying to resolve.
Outside of batch operations in a single PHP request, you'll get less benefit from this, because in general a PHP execution is not a long-lived thing.
However, allowing eventual work to be shifted to a deferred job could be very valuable, which overlaps with https://github.com/silverstripe/silverstripe-framework/issues/6524
In the simplest implementation, deferred jobs were to be executed via register_shutdown_function.
That's a great idea; The "how" is yet to be examined, as I'm not sure how we could standardise serializable promises (if the job api required such). I know that some php libraries actually attempt this via closure-serialisation, so it's not outside of the bounds of possibilities. :)
@tractorcow To the extent that this relates to https://github.com/silverstripe/silverstripe-versioned/issues/53, would it be less invasive to the API to add the ability to batch up the asset moves into a single operation? Perhaps a flag when publishing individual File records to say "don't update the underlying DBFile" and then a separate call that says "publish all this array of DBFiles"
The batch publication system can then make use of those hooks.
Alternatively, some kind of mechanism that lets you call publishRecursive()
on a list of DataObjects rather than single one. This would probably be better for optimising recursive publishes as a whole
Isn't that just what a ChangeSet is?
Right now ChangeSet calls publishSingle()
on each item individually. It would need to do something smarter than that, but yes, ChangeSet would be the place where such a list was handled.
Before this card moves into "next", please validate:
File->publish()
? You can't get a job queue status synchronously. Promises would compliate Sam's proposed Job
interface.Job
interface a dependency on this card? It's not in 4.1.0 at the moment.Is this still a problem? A batch publish of 20 assets (1MB JPEGS) should take less than 20s
Possibly it isn't; I have a feeling that the remaining potential performance enhancements for this story may outweight the cost (complexity / overhead). I'll re-evaluate and confirm it's still valuable before proceeding with this.
Sweet, let me know if you think it's worth keeping in the 4.1 milestone. I'd rather make some progress on the underlying issue (lack of opt-in async job definitions in core) than coming up with a short-term fix (e.g. promises) for this particular instance.
I don't think it is... I want to take it out as it's reasonably low priority vs the other work in 4.1 currently.
Background
At the core of the asset system is a pair of stores, public and protected, where files may be arranged in either of these locations, and must occasionally be moved between (or deleted from) them as visibility of any file changes.
This behaviour is controlled by AssetControlExtension, which hooks into the onBeforeWrite() to perform all of these manipulations before the asset is written.
This extension will, for each DBFile reference on every dataobject, perform a comparison between 2-3 versions of this record:
The calculation ensures that we always prefer the most public visibility for any asset. For example, if a file (identified by Hash / Filename) exists in both the stage and live records, it'll evaluate to a public asset. Likewise, a file which exists on the prior record, but not in either of the to-be-written staged records will be treated as to-delete.
Performance concerns
The issue with this behaviour is that, for a large number of files (each of which with many possible writes called on it) is that a large number of queries may be resulted. Normally when publishing a file, this file may be saved first to draft, and then as a subsequent save to live. This requires the above behaviour to be triggered fully twice, doubling up all queries.
Proposal
One of the requirements of our initial asset store was to support "eventually consistent" assets. Thus it is not necessary for all operations to be performed immediately on these assets.
My proposal it to shift these calculations into a promise which can be resolved either via an explicit request (to some asset promise store), or otherwise at the end of the request, perhaps via a middleware, or register_shutdown_function() as a final guard.
By shifting this to a promise we can thus change the execution flow from:
Write File A to draft
Trigger AssetControlExtension
DB queries / filesystem operation
Write File A to live
Trigger AssetControlExtension
DB queries / filesystem operation
To
Write File A to draft
create promise to update asset visibility
Record prior value of File A ($priorRecord) with this promise (use $file->original) via $promise->then()
Create File A to live
Ensure promise exists (it does)
Record prior value of File A ($priorRecord) with this promise (use $file->original) via $promise->then()
Rest of application execution
Application shutdown triggers promise execution (unless prior resolved)
Trigger AssetControlExtension
DB queries / filesystem operation
For single writes of a file, this should cut down the number of queries by at least a 1/2, and even more on subsequent writes.
I suggest to look at something such as https://github.com/guzzle/promises as a means of implementing these promises, with a separate injected Factory (AssetPromiseFactory) to handle the get-or-create behaviour for these.
Implementation details
Each promise will maintain a single AssetManipulationList() instance, which may receive manipulations over the course of execution of the application. The ->resolve() behaviour will trigger the eventually processManipulation() of this list at the end of the request lifecycle.
Related issues