silverstripe / silverstripe-framework

Silverstripe Framework, the MVC framework that powers Silverstripe CMS
https://www.silverstripe.org
BSD 3-Clause "New" or "Revised" License
720 stars 823 forks source link

RFC Neo-Versioning / Change Sets API #4932

Closed tractorcow closed 8 years ago

tractorcow commented 8 years ago

RFC Neo-Versioning / Change Sets API

https://groups.google.com/forum/#!topic/silverstripe-dev/p2sHTRqNmsc

1. Problem

DataObject versioning suffers from a set of problems, including but not limited to the below:

In order to address these issues a new neo-versioning mechanism based on change-sets is proposed. Rather than focusing on operations made to individual objects, the major concept of this feature is that the content editor is always working with groups of changes pending publishing. Tracking changes to objects which rely on changes to other objects, such as those via dataobject relationships will ensure that all necessary objects are grouped together in the same group. To support this feature, DataObjects will now be versioned by default (with some exceptions), as will many_many relationships between objects.

On top of this a new user interface will be developed to assist content editors with control over changes pending publication. The requirements of this interface would be to:

The focus of this RFC is solely on the low level API. UX design and development will be covered separately, and thus not addressed in great detail.

3. Concepts

The following new concepts are introduced in this RFC:

Ownership represents the relationship between two objects, where the rendering of the owner depends on the component "owned" object. For instance, a page which contains several content blocks will depend on those objects being published for rendering.

A non-ownership relationship between two objects will be treated as a simple reference, where objects on neither end are treated as depending on the existence or update of the other. Where one object simply summarises a set of linked content, but is not reliant on what that content is, then this should also be left as a non-ownership relationship. For example, a link to an adjacent page of the current top level menu section, or between a Blog and BlogPost (because the holder page isn't reliant on specific BlogPage content).

When there is a pending publication change, it is desirable that before publishing, all the locations where such a change will affect are displayable, so that the publisher can ensure the appropriateness of the change in all locations.

4.1.1. Ownership terminology

Alternative sets of terms that could be substituted for this component (to be confirmed)

For instance, given this portion of the object graph:

image

Whenever a blog post is updated, it will normally require that any dependant content (e.g. content blocks and their respective images) are published at the same time, in order for the item to remain consistent.

Thus in this example, blog posts own content blocks, and content owns images.

Blog tree does NOT own blog posts, as the blog tree is simply an aggregate holder for sub-pages, and does not rely on specific blog posts being published in order for it to be viewed correctly.

Typically, ownership should correlate to the number of objects editable in the same CMS view. E.g. a page edit form with a gridfield to a child object type represents an ownership relationship between that page and that object. When editing a page, you expect the items you see to appear when published. This feature ensures consistency between that view and the final published page.

4.1.3. Declaring Ownership

Ownership, unlike other relationship types, is only declared on the owner object via the configuration system. I.e. owns not owned_by is set. Owned by is implicit by any many_many, has_many, blongs_to, has_one, or belongs_many_many where the other side declared this object as owned.

Ownership is also always implied by shortcodes in html areas on pages, and will likely be implemented in SiteTreeLinkTracking on the ImageTracking relationship.

Ownership currently isn’t available on non-relationship (i.e. custom method) fields, but will likely be available in the future. It also is not enabled on relationships where only one end is declared (e.g. has_one to File but no has_many applied back to the object).

For example, this is how a dataobject would declare ownership.

<?php

class BlogPost extends Page {
    private static $has_many = array(
        'Blocks' => 'ContentBlock'
    );
    private static $owns = array(
        'Blocks'
    );
}

class ContentBlock extends DataObject {
    private static $has_one = array(
        'Post' => 'BlogPost',
        'Banner' => 'Image'
    );
    private static $owns = array(
        'Banner'
    );
}

/** Extension applied to Image */
class ContentBlockImage extends DataExtension {
    private static $has_many = array(
        'Blocks' => 'ContentBlock'
    );
}

In order to support this feature, the following API will be added to DataObject.

<?php

class DataObject extends Object {

    /**
     * Returns all objects owned by the current. Supports recursive search.
     * @return SS_List
     */
    public function FindOwned($recursive = true);

    /**
     * Returns all objects which own this object. Supports recursive search.
     * @return SS_List
     */
    public function FindOwners($recursive = true);

}

In this case, calling $blogPost->FindOwned() will return all images and content blocks for that page. $blogPost->FindOwned(false) will return only content blocks.

Note that this mechanism will only find records that exist in the same stage (Draft / Live) as the parent record. The version of these methods in ChangeSetItem class will need to work across stages (and for deleted records) automatically.

4.1.4. Ownership of many_many mapping table records

Although not an explicit dataobject, rows inside a many_many table will automatically be "owned" by one side of the relationship. This does NOT mean that the object on the other side is owned, only the mapping itself.

This is determined either by:

Publishing a record that owns the relationship will publish the mapping table as well.

4.1.5. Ownership relational integrity

In an ownership relation, the integrity of DataObjects must be maintained. For instance, when adding a new page, and a set of new images that are displayed on that page, it would create a broken view to publish only that page but not the images.

In order to maintain integrity the following rules must be applied:

Possible front-end term: "Project"

Object changes will not be stored automatically in any dataobjects, but can be determined by inspecting the Object.Version with Object_Live.Version values for each dataobject. This information could be potentially cached.

A ChangeSet represents a user-customised subset of all global changes which can be published together. This contains a list of ChangeSetItem objects, each of which represents changes to a single object.

ChangeSet objects can have a number of Member or Groups assigned as owners, and by default will be assigned to the member who created it. This will allow permissions to be decided based on group or member ownership (e.g. via extensions).

Any object could exist in any number of changesets, and even added to a changeset in advance of being published (although it will likely be hidden from the changeset list until there are any modifications).

4.2.1. Change Set API

A concept API for changesets is as below.

_Implementation note: The behaviour of polymorphic hasones will need to be tweaked to ensure changed class references aren't broken. Currently they do.thank

image

Versioned.php

All versioned objects will now have an optional 1-to-1 relationship between DataObject (1) and ChangeSetItem (0..1) reference, which can be used to determine any changesets this item belongs to.

<?php

class Versioned extends DataExtension {
    private static $belongs_to = array(
        'Change' => 'ChangeSetItem.Object'
    );
}

ChangeSet.php

With regards to state:

Instead, if a changeset needs to be un-published, a new changeset will be created to reverse the actions of the prior one.

<?php

namespace SilverStripe\Versioning;

class ChangeSet extends DataObject {

    /** An active changeset */
    const STATE_OPEN = 'open';

    /** In the process of reverting */
    const STATE_REVERTING = 'reverting';

    /** A changeset which is reverted and closed */
    const STATE_REVERTED = 'reverted';

    /** A changeset which is published and closed */
    const STATE_ADDED = 'published';

    /** In the process of being published */
    const STATE_PUBLISHING = 'publishing';

    private static $db = array(
        'Name' => 'Varchar',
        'State' => "Enum('open,published,reverted,publishing,reverting')"
    );

    private static $has_many = array(
        'Items' => 'ChangeSetItem',
    );

    private static $has_one = array(
        'Owner' => 'Member'
    );

    /**  Get the name of this changeset */
    public function getName();

    /** Gets the list of ChangeSetItems for this changeset */
    public function getChanges();

    /** Removes this changeset, and moves all items back to the global changeset. Cannot be done on global changeset. */
    public function delete();

    /** Publish this changeset, then closes it. */
    public function publish();

    /** Revert all changes made to this changeset, then closes it. **/
    public function revert();

    /** Add a new change to this changeset. Will automatically include all owned changes as those are dependencies of this item. */
    public function addItem(DataObject $item);

    /** Remove an item from this changeset. Will automatically remove all changes which own (and thus depend on) the removed item. */
    public function removeItem(DataObject $item);

    /** Include any owned changes in this changeset */
    public function sync();

    /** Verify that any objects in this changeset include all owned changes */
    public function validate();

    /** Standard permission mechanisms */
    public function canView($member = null);
    public function canEdit($member = null);
    public function canCreate($member = null);
    public function canDelete($member = null);
    public function canPublish($member = null);
    public function canRevert($member = null);
}

ChangeSetItem.php

In order to include a change in a changeset, a ChangeSetItem is created with the linked ID and base ClassName of the object being changed.

After a changeset is published (but not before), each item in the changeset will have the VersionBefore and VersionAfter field set.

After a changeset is reverted (but not before), then these fields will contain the values:

Thus, even if a changeset is reverted, the "changeset revert" can be undone potentially (as long as none of the changed items have been subsequently changed since then).

_Note: A potential enhancement to this is to also include changes to many_many mapping table for the changed record, although this could also be reverted from the many_manyversions table as well.

<?php

namespace SilverStripe\Versioning;

/**
 * A single line in a changeset
 */
class ChangeSetItem extends DataObject {

    /** Represents an object deleted */
    const CHANGE_DELETED = 'deleted';

    /** Represents an object which was modified */
    const CHANGE_MODIFIED = 'modified';

    /** Represents an object added */
    const CHANGE_CREATED = 'created';

    /**
     * Represents an object which hasn't been changed directly, but owns a
     * modified many_many relationship.
     */
    const CHANGE_MANYMANY = 'manymany';

    /**
     * Represents that an object has not yet been changed, but
     * should be included in this changeset as soon as any changes exist
     */
    const CHANGE_NONE = 'none';

    private static $db = array(
        'VersionBefore' => 'Int',
        'VersionAfter' => 'Int',
        'State' => 'Enum('open','published','reverted')'
    );
    private static $has_one = array(
        'ChangeSet' => 'ChangeSet',
        'Object' => 'DataObject',
    );

    /** Get the type of change: none, created, deleted, modified, manymany */
    public function getChangeType();

    /** Publish this item, then close it. */
    public function publish();

    /** Reverts this item, then close it. **/
    public function revert();
}

4.2.2. Actions

The following actions can be invoked on objects or changesets, and follow these general processes. Note that not all of these actions will be represented as individual user interface controls, and will depend on UX implementation.

4.2.2.1. ChangeSet::sync

Whenever a changeset is loaded into the UI (or just prior to publish or revert) a synchronisation of all objects should be done. This will detect any changes to objects owned by those records already in this changeset, and subsequently include those changes too in this changeset.

See 4.1.2.1. Ownership relational integrity

4.2.2.2. ChangeSet::validate

Ensure that the changeset includes all dependant objects. This should always return true if following a successful sync() invocation, but does not make any changes itself.

This mechanism can be invoked by user code to validate that a changeset includes all dependant changes, and thus is ready for publish. For instance, if a changed image (not in this changeset) is required by a changed page (in this changeset) then validate should return false.

See 4.1.2.1. Ownership relational integrity

4.2.2.3. ChangeSet::addItem

Front end term: "Include in project"

This action will add a change (normally from the global changeset) to the target changeset.

Note: if the changeset is valid prior to this method call, it should still be valid without requiring sync() to be called.

4.2.2.4. ChangeSet::removeItem

Front end term: "Remove from project"

Note: if the changeset is valid prior to this method call, it should still be valid without requiring sync() to be called.

4.2.2.5. ChangeSet::publish

Front end term: "Publish project"

This action publishes the changeset, and all items within it

Not available as an independent front-end action.

This is the method intended to publish only changes to a single object and its many_many tables. Unlike DataObject::publish, this does NOT do recursive publishing of owned objects, as this process relies on ChangeSet to iterate over those objects independently.

Front-end term: "Revert project changes"

This action reverts all changes in the given changeset. I.e. resets the current state of the underlying object to match that of the live version.

Note that this is intended to be called on a changeset before it's published.

Not available as an independent front-end action.

Reverts all changes for the current object, and its many_many tables, to the version in the live table.

Front-end term: "Save"

API for this method is unchanged as a part of this RFC.

4.2.2.10. DataObject::revert

Front-end term: "Revert changes"

Revert will set the draft state of an object, and its owned objects, to the current live version.

I.e. "quick revert" for single object, and bypasses the ChangeSet::revert process. This action is disabled for items that do not exist on live (but they can still be deleted).

Since this consolidates the live and stage version immediately, there is no point in this being allowed as a changeset item.

Front-end term: "Publish"

Performs an immediate publish of this object and all owned objects, automatically. Similar to ChangeSetItem::publish, but acts recursively. I.e. "quick publish" for single object, and bypasses the ChangeSet::publish process, and thus cannot be included in a changeset itself.

Note that this process also bypasses the "change preview" and "this change effects" mechanisms provided by changesets, and thus is inherently riskier.

For instance, Left has_many Right objects that it declares as 'owns'. If I delete several Right objects, and then quick-publish a Left object, any right objects that point to the left object AND which were in the deleted group, will simiply have the LeftID set to 0 instead of deleted.

Note: _The "set hasone as 0 instead of deleting" work-around is used only for quick DataObject::publish, and not for changset::publish(), as changesets will allow the users to preview all deleted objects prior to publish, so there is no risk of unintended consequences there.

4.2.2.12. DataObject::doRollbackTo($version)

Front end term: "Revert to this version" When viewing a version in the history tab

Reverts a record to a specific Version number. This is stageable and won't affect live until a changeset is published with these changes.

Front end term: "Archive"

Performs an immediate deletion of this record from both live, and stage. Bypasses changesets and cannot be included in a changeset.

Front end term: "Delete"

Deletes a record from the draft stage. This allows you to stage deletions for later removal from live.

Front end term: "Unpublish"

Performs an immediate deletion of this record from the live stage. Bypasses changesets and cannot be included in a changeset. (use delete instead if you want to stage a page removal).

Note: the meaning of the term "unpublish" has been left consistent with the behaviour of "unpublish" action in 3.x. This feature could potentially be replaced with a new action, which is to "undo the last publish of this record", which would instead query the DB for live version of this record prior to the current, and revert it to that. This would require additional scoping.

4.3. Versioned Relationships

This feature provides the ability to version many_many relationships between two versioned objects (or between a versioned and an un-versioned object). The goal of this is to allow content editors to prepare a list of records and preview them, prior to making this list live.

An important note is that this feature does NOT mean that relationships between records of a specific version; Rather, this is only a versionable record of relationships that exist between two objects in the SAME stage. I.e. Only the relationship itself is versioned.

4.3.1. Many_many relationship versioning

The schema for versioned many_many relationships would be as below:



 

Left Table

Mapping Table

Right Table

Draft

<Left>

  • ID
  • Version
  • <fields>

Left_Right

  • ID
  • LeftID
  • RightID
  • Version
  • <extrafields>

Right

  • ID
  • Version
  • <fields>

Live

Left_Live

  • ID
  • Version
  • <fields>

Left_Right_Live

  • ID
  • LeftID
  • RightID
  • Version
  • <extrafields>

Right_Live

  • ID
  • Version
  • <fields>

_versions

Left_versions

  • ID
  • RecordID
  • Version
  • <fields>

Left_Right_versions

  • ID
  • RecordID
  • LeftID
  • LeftVersion
  • RightID
  • RightVersion
  • Version
  • <extrafields>

Right_versions

  • ID
  • RecordID
  • Version
  • <fields>

While has_one and has_many relationships automatically will be versioned against the objects holding the foreign key value, a special versioning mechanism must be applied to many_many.

Since changes and publishing actions are applied to objects, the "change" for any relationship alteration will be assigned to an object based on the following rules:

Accepted Limitations:

In the above Schema Left_Right.Version is the version record of the mapping table itself, and does not relate to either the Left or Right Version numbers.

The LeftVersion and RightVersion values in the mapping table ONLY exist in the Left_Right_versions table, and exist solely for the purpose of restoring a relationship mapping when restoring to the specific Version of either a Left or Right record.

_Note: Version numbers in the versions mapping table only represent the version of that object when the relationship was last saved; The relationship isn't updated every time any objects are given a new version though!

For instance, if I am reverting a Left record with ID = 2 to Version = 4 (to the draft stage), I would inspect all values of Left_Right_versions with LeftID = 2 and LeftVersion <= 4. For each possible mapping record I would do the following checks:

  1. If LeftVersion === 4 (the version I want to restore) then restore that row to Left_Right
  2. If LeftVersion < 4 then an additional check is made. If EITHER of the following two conditions are true, then the record is restored, otherwise it is deleted:
    1. If that value for the mapping row is currently live (e.g. a map with LeftVersion = 3, LeftID = 2, as well as RightID = that's currently in Left_Right_Live).
    2. Or, a newer version for that mapping row exists with LeftVersion > 4 (e.g. a map with LeftVersion = 3, LeftID = 2, but there is also a row with LeftVersion = 5, LeftID = 2 somewhere with the same RightID).
  3. All other rows in Left_Right with LeftID = 4 that were not restored under at either step #1 or #2 above are removed.

    4.3.2. Has_many relationship versioning

In addition, the table schema for has_one fields will need to be updated to record historical version references.



 

Left Table

Right Table

Stage

<Left>

  • ID
  • Version
  • <fields>

Right

  • ID
  • LeftID
  • Version
  • <fields>

Live

Left_Live

  • ID
  • Version
  • <fields>

Right_Live

  • ID
  • LeftID
  • Version
  • <fields>

_versions

Left_versions

  • ID
  • RecordID
  • Version
  • <fields>

Right_versions

  • ID
  • RecordID
  • LeftID
  • LeftVersion
  • Version
  • <fields>

 

The process for reverting a LeftTable to a specific version will be similar to that for a many_many relationship.

4.4. Unversionable objects

Support for un-versioned objects should still be maintained, as certain information (such as factual or state) should not be applied to the publishing workflow. Examples include:

Users upgrading from 3.x should not have to make any changes to their datamodel, nor their existing data in order for the schema to work. However, reverting to versions saved prior to the update will likely not restore data relying on new columns, such as archived many_many relationships.

The following changes are recommended to help users improve existing code for 4.x:

kinglozzer commented 8 years ago

Whew, that’s a long read! Great work @tractorcow.

I’ll try to find time to dig into 4.2 and 4.3 in detail soon, initial thoughts:

4.1.1. Ownership terminology

Owner / owns / owned is the most intuitive of those examples in my opinion. When it comes to declaring ownership at a code level, I also wondered about private static $depends_on = array('ContentBlocks');. I was wondering how CMS users would understand this terminology, but that doesn’t actually matter at this stage as it’d be part of the UX work.

Ownership currently isn’t available on non-relationship (i.e. custom method) fields

Also not critical at this stage, but have you had any thoughts on how this might work? Use of custom methods to aggregate data is fairly common, so I think this might be quite important.

Can multiple changesets for a single object be published independently of one-another? E.g. Is it possible to create one changeset to update a category’s products, and a separate changeset to update the same category’s “special offers” relation, but then publish one of those without publishing the other one, in any order?

tractorcow commented 8 years ago

Also not critical at this stage, but have you had any thoughts on how this might work?

We'd need some kind of observable API that allows instances of classes to declare if an object is "owned" by one of their fields. @hafriedlander has written up an example API for this, but it would be a bit of extra work to implement. :P

Can multiple changesets for a single object be published independently of one-another?

A change can't be published twice, so it'd be published with whichever changset was published first.

In your example, the objects are different, so it's ok; The mapping for special offers would belong to the many_many end (categorypage), but the actual products is a separate dataobjects, so you could publish them independently.

If you said that category owns products, then it would be impossible though. :) You couldn't publish the products without publishing the category.

nyeholt commented 8 years ago

Am on the run at the moment so no time for a fully detailed response yet, but a few thoughts

tractorcow commented 8 years ago

A changeset should be able to be put through one or more workflows; whether that's a set of steps requesting additional content to be contributed to the changeset, or an approval type of workflow to publish the set

Yep, that's the whole idea... and there will be plenty of extension points put throughout the code to support custom workflow management modules (although there probably won't be much workflow in place by default).

A changeset should be serializeable to JSON (eg via restful server or similar) for external consumption of what that changeset covered, including any deletion of content

If you mean restful server support, then it will be... it is just a dataobject after all. :)

nyeholt commented 8 years ago

If you mean restful server support, then it will be... it is just a dataobject after all. :)

Yep, the general design is very familiar :), but I can see there's several improvements that I like (eg things like tracking before/after versions).

Is this something that's a thing in SS4 now? I haven't been following closely enough to see whether it's something you can do now or if it's just shorthand for the purpose of documentation

 private static $has_one = array(
          'Object' => 'DataObject',
    );

Has there been consideration to allowing a user to 'lock' an object once in a changeset? I agree that the 'default' capability should be open, but I can see in some scenarios that once a piece of content is being edited down one channel, it shouldn't be possible for it to be edited (and potentially published) via another. Though, this may be something implemented using an extension to change the canEdit() logic.

tractorcow commented 8 years ago

Since 3.2 actually. :) It's called polymorphic relations, and the docs for it were temporarily lost in an unfortunate merge accident... they'll be restored soon. ;P

https://github.com/silverstripe/silverstripe-framework/issues/4905

tractorcow commented 8 years ago

Has there been consideration to allowing a user to 'lock' an object once in a changeset?

It would have to be done on the model level, not the changeset. I would expect that a custom module or user code would need to extend that object to (as you suggest) implement canEdit() in a way that respects business logic.

In core, we aren't making any assumption about what kind of business logic is necessary, but we'll ensure it's able to be overridden via extensions to support such workflows.

chillu commented 8 years ago

Massive effort, thanks for digging through all the edge cases! I think this will be a great improvement to a good portion of CMS authors (less surprises in the publishing model). That being said, I'm a bit concerned about the technical complexity, and how much we can shield the average SilverStripe developer from it.

The following simple example ends up storing data in 15 tables in 4.x (11 tables in 3.x).

class BlogArticle extends Page {
    private static $many_many = array('Quotes' => 'Quote');
    private static $owns = array('Quotes');
}
class Quote extends DataObject {
    private static $belongs_many_many = array('BlogArticles' => 'BlogArticle');
}

Creates the following tables:

SiteTree
SiteTree_versioned
SiteTree_Live
Page
Page_versions
Page_Live
BlogArticle
BlogArticle_versions
BlogArticle_Live
Quote
Quote_versions (new)
Quote_Live (new)
BlogArticle_Quotes
BlogArticle_Quotes_versions (new)
BlogArticle_Quotes_Live (new)

Maybe the authors and developers care enough about the Quote content to create drafts before publishing them, maybe they're fine with save=publish. In the ideal world we're painting here, everything is consistently versioned, and consistently fits in the ChangeSet model. But I've seen too many SQLQuery invocations with stage-dependant code ({$table}_Live) to believe this comes for free - there will be development overhead on the average project due to Versioned. These SQLQuery statements are often created as performance optimisations for raw query performance (rather than relying on DataObject instances), and then later on patched up with staging support when authors complain about inconsistent previews.

So my point is that neo versioning in its current form should be supported by ORM improvements which make these manual queries less likely than today, since it becomes impractical for devs to come up with the required join operations themselves - particularly across a many_many relationship on different stages.

@hafriedlander @tractorcow @clarkepaul Have you discussed how only versioning some content models could affect the UX?

tractorcow commented 8 years ago

I'm thinking that I actually want to implement has_many_through instead of versioned many_many tables. That way you can add (or not add) versioned to a model, and make that your mapping table.

sminnee commented 8 years ago

Have we completed this one now?

hafriedlander commented 8 years ago

This has been done, but there's lots of information I'd hate to loose to the "big pile of closed issues". Any suggestions how we pull this into docs?

hafriedlander commented 8 years ago

As per #4938 I will raise the process for closing an RFC with core team (and am changing milestone in the meantime, although this one is complete, unlike #4938)

tractorcow commented 8 years ago

Sorry I need to mark this issue as incomplete; We are yet to finish implementing versioned many_many, although this is fine to schedule as alpha 2.

Our internal jira reference for that story is https://silverstripe.atlassian.net/browse/OSS-1481

sminnee commented 8 years ago

This has been done, but there's lots of information I'd hate to loose to the "big pile of closed issues". Any suggestions how we pull this into docs?

FYI: we have this https://github.com/silverstripe/silverstripe-framework/issues?utf8=%E2%9C%93&q=is%3Aclosed+is%3Aissue+label%3Arfc%2Faccepted+

If there is documentation activity to do, I guess we could create a "document X" issue?

tractorcow commented 8 years ago

I've re-tasked https://silverstripe.atlassian.net/browse/OSS-1481 with the intention to implement via has_many_through.

chillu commented 8 years ago

The JIRA ticket Damian mentioned above is now tracked as https://github.com/silverstripe/silverstripe-framework/issues/5615

sminnee commented 8 years ago

Since the only outstanding work is covered by #5615, I'm closing this issue.