tractorcow commented 6 years ago

Background

At the core of the asset system is a pair of stores, public and protected, where files may be arranged in either of these locations, and must occasionally be moved between (or deleted from) them as visibility of any file changes.

This behaviour is controlled by AssetControlExtension, which hooks into the onBeforeWrite() to perform all of these manipulations before the asset is written.

This extension will, for each DBFile reference on every dataobject, perform a comparison between 2-3 versions of this record:

The prior record value (any existing attached file), which should be deleted
The value which will be written to draft (if versioned), which should be protected
The value which will be written to live, which should be made public. a canView check is also done, in which case this may be made protected instead (for published but protected assets).

The calculation ensures that we always prefer the most public visibility for any asset. For example, if a file (identified by Hash / Filename) exists in both the stage and live records, it'll evaluate to a public asset. Likewise, a file which exists on the prior record, but not in either of the to-be-written staged records will be treated as to-delete.

Performance concerns

The issue with this behaviour is that, for a large number of files (each of which with many possible writes called on it) is that a large number of queries may be resulted. Normally when publishing a file, this file may be saved first to draft, and then as a subsequent save to live. This requires the above behaviour to be triggered fully twice, doubling up all queries.

Proposal

One of the requirements of our initial asset store was to support "eventually consistent" assets. Thus it is not necessary for all operations to be performed immediately on these assets.

My proposal it to shift these calculations into a promise which can be resolved either via an explicit request (to some asset promise store), or otherwise at the end of the request, perhaps via a middleware, or register_shutdown_function() as a final guard.

By shifting this to a promise we can thus change the execution flow from:

Write File A to draft
Trigger AssetControlExtension
DB queries / filesystem operation
Write File A to live
Trigger AssetControlExtension
DB queries / filesystem operation

To
Write File A to draft
create promise to update asset visibility
Record prior value of File A ($priorRecord) with this promise (use $file->original) via $promise->then()
Create File A to live
Ensure promise exists (it does)
Record prior value of File A ($priorRecord) with this promise (use $file->original) via $promise->then()
Rest of application execution
Application shutdown triggers promise execution (unless prior resolved)
Trigger AssetControlExtension
DB queries / filesystem operation

For single writes of a file, this should cut down the number of queries by at least a 1/2, and even more on subsequent writes.

I suggest to look at something such as https://github.com/guzzle/promises as a means of implementing these promises, with a separate injected Factory (AssetPromiseFactory) to handle the get-or-create behaviour for these.

Implementation details

Each promise will maintain a single AssetManipulationList() instance, which may receive manipulations over the course of execution of the application. The ->resolve() behaviour will trigger the eventually processManipulation() of this list at the end of the request lifecycle.

Related issues

https://github.com/silverstripe/silverstripe-versioned/issues/53

sminnee commented 6 years ago

Allowing for eventual consistency will give us more freedom to build flexible architectures.

Moving to a promise architecture might speed up performance of batch file operations, but is that the biggest problem you're trying to resolve.

Outside of batch operations in a single PHP request, you'll get less benefit from this, because in general a PHP execution is not a long-lived thing.

However, allowing eventual work to be shifted to a deferred job could be very valuable, which overlaps with https://github.com/silverstripe/silverstripe-framework/issues/6524

In the simplest implementation, deferred jobs were to be executed via register_shutdown_function.

tractorcow commented 6 years ago

That's a great idea; The "how" is yet to be examined, as I'm not sure how we could standardise serializable promises (if the job api required such). I know that some php libraries actually attempt this via closure-serialisation, so it's not outside of the bounds of possibilities. :)

sminnee commented 6 years ago

@tractorcow To the extent that this relates to https://github.com/silverstripe/silverstripe-versioned/issues/53, would it be less invasive to the API to add the ability to batch up the asset moves into a single operation? Perhaps a flag when publishing individual File records to say "don't update the underlying DBFile" and then a separate call that says "publish all this array of DBFiles"

The batch publication system can then make use of those hooks.

sminnee commented 6 years ago

Alternatively, some kind of mechanism that lets you call publishRecursive() on a list of DataObjects rather than single one. This would probably be better for optimising recursive publishes as a whole

tractorcow commented 6 years ago

Isn't that just what a ChangeSet is?

sminnee commented 6 years ago

Right now ChangeSet calls publishSingle() on each item individually. It would need to do something smarter than that, but yes, ChangeSet would be the place where such a list was handled.

chillu commented 6 years ago

Before this card moves into "next", please validate:

Is this still a problem? A batch publish of 20 assets (1MB JPEGS) should take less than 20s
Are we restricting devs by removing synchronous nature of File->publish()? You can't get a job queue status synchronously. Promises would compliate Sam's proposed Job interface.
Can we get away with doing a one-off PHP shutdown handler without making the Job interface a dependency on this card? It's not in 4.1.0 at the moment.

tractorcow commented 6 years ago

Is this still a problem? A batch publish of 20 assets (1MB JPEGS) should take less than 20s

Possibly it isn't; I have a feeling that the remaining potential performance enhancements for this story may outweight the cost (complexity / overhead). I'll re-evaluate and confirm it's still valuable before proceeding with this.

chillu commented 6 years ago

Sweet, let me know if you think it's worth keeping in the 4.1 milestone. I'd rather make some progress on the underlying issue (lack of opt-in async job definitions in core) than coming up with a short-term fix (e.g. promises) for this particular instance.

tractorcow commented 6 years ago

I don't think it is... I want to take it out as it's reasonably low priority vs the other work in 4.1 currently.

silverstripe / silverstripe-assets

RFC Shift asset visibility re-assignment into an eventually-resolved promise #88

Background

Performance concerns

Proposal

Implementation details

Related issues