silverstripe / silverstripe-linkfield

Silverstripe module for managing links
BSD 3-Clause "New" or "Revised" License
4 stars 13 forks source link

RFC-30 / Data Model #30

Closed dnsl48 closed 10 months ago

dnsl48 commented 3 years ago

RFC-30 / Data Model

About the RFC

The RFC describes the data model of the module, the decisions made and captures the reasoning behind those.

The data model defines

Resources

Forum discussion (the module idea) - New links module?

Github discussion (this RFC) - issue-30

User Stories

Seeking some feedback from the community and internal teams we identified two main user stories that people would like to cover

  1. Link is a part of the parent (belongs to its parent) and not editable separately.
  2. Link is a shared object that multiple DataObjects may refer to (via has_one).

Technical concerns

For a potential implementation the following technical concerns have been stated:

Existing solutions

sheadawson/silverstripe-linkable

The most popular community module at the time of writing, with just over 220k downloads.

No longer maintained. Recommends switching to gorriecoe/silverstripe-linkfield.

Implementation: A link is a DataObject (class Link, table LinkableLink). DataObjects use has_one to implement links. Custom link types implemented via SilverStripe\ORM\DataExtension for the Link DataObject, which adds new custom database fields and extends the updateCMSFields method. UI implementation: Backend: LinkField class extending SilverStripe\Forms\TextField Frontend: Entwine extension bundled with the module

gorriecoe/silverstripe-linkfield

The second most popular community module at the time of writing, with just over 37k downloads.

Is maintained at the moment of writing. Built on top of gorriecoe/silverstripe-link. Has a dozen of satellite modules extending the functionality in a decoupled way.

Implementation: A link is a DataObject (class Link, table Link, module gorriecoe/silverstripe-link). DataObjects use has_one or many_many. Custom link types implemented via SilverStripe\ORM\DataExtension for the Link DataObject, which add new custom database fields and extend the updateCMSFields method. UI implementation: Backend: LinkField extends a SilverStripe\Forms\FormField (module gorriecoe/silverstripe-linkfield) Frontend: only CSS. Relies on gridfields and OOTB framework components.

Considered implementation options

Option A

Link is a DataObject.

Implemented by sheadawson and gorriecoe. Easiest to extend via DataExtension. Can be shaped to has_one, has_many, many_many. More complex relationship versioning and handling (although we have that solved through cascading publish etc.). More performance impact on query time. Can use lots of built-in editing UIs directly (e.g. GridField). Works with other core features like diff views, reports, etc. But also adds more database tables and schema complexity. Most consistent, SilverStripe devs know how to deal with this.

Pros:

Cons

Notes:

Option B

Link is a DBField with serialised data (e.g. JSON).

Implemented by bannerblock. Harder to extend. Serialised JSON makes it hard to query individual fields (without using something like GitHub - phptek/silverstripe-jsontext: JSON storage and querying 1 and JSON database support). Harder to enforce consistent schema. Creates new dependency if done properly (e.g. through phptek’s module). Less native form handling (e.g. validation errors?)

Cons:

Option C

Link is a CompositeField with individual database columns.

Harder to extend. Easy to query. Strong schema. Can be wrapped in arbitrary DataObject structures or relationships. Can “inline” columns into a DataObject to avoid relationship in cases where only one link is required. Can use lots of built-in editing UIs indirectly (e.g. GridField through DataObject). Less native form handling (e.g. validation errors?)

Cons:

Potential data structure (columns): see PoC below

Extending: see PoC below

Frontend:

Notes:

Comparing the options

Option C is the most flexible way to store the data. Option B could exist on top of Option C, if necessary, as a separate link type implementation. Option A could exist on top of Option C, if necessary, as a separate data object containing the DBLink filed (Option C)

For users there's no much difference between Option A and Option B/C. The main difference is for devs implementing new link types.

The conclusion

Option C is preferable

Proof of Concept for the Option C (PoC)

Implemented in dev/rfc30/poc branch.

Base DbField implementation

ORM\DBField\Link is the base class extending DBComposite and providing the basic API for the children (particular link implementations). The class is abstract, which guarantees all owners know exact link types they contain. The abstract Link overrides DBComposite::compositeDatabaseFields() method returning a single own DB field - DBLink_Type plus the fields provided by child implementations via getCompositeDbFields. For every link of the DataObject the field DBLink_Type stores the particular link implementation (class name).

The extension point for children is abstract public static function getCompositeDbFields(): array;. Every child (link type implementation) defines its own list of fields that would be added to the DataObject utilising the LinkField.

DataObject Extension

Extensions\DataObjectExtension provides extra capability on top of DataObject::populateDefaults that ensures all DBLink_Type fields are correctly initialized and contain correct values when creating empty DataObjects.

Link Type implementations

The PoC implements 3 examples:

Front-end rendering options

CMS rendering

The PoC is missing the CMS implementation for the fields, but it should be possible to provide the usual implementations for both Entwine and React front end methods.

Any particular FrontEnd implementation details are out of scope for this RFC (irrelevant for Data Model).

Validation

Data Structure

The provided PoC was tested with the following Page extends SiteTree DataObject model:

...

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'HyperLink' => LinkField\ORM\DBField\HyperLink::class,
        'SiteLink' => LinkField\ORM\DBField\SiteTreeLink::class,
    ];
}

That Page model produced the following database structure:

CREATE TABLE `Page` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `HyperLinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\HyperLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\HyperLink',
  `HyperLinkTitle` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `HyperLinkURL` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `SiteLinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink',
  `SiteLinkSiteTreeId` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `HyperLinkDBLink_Type` (`HyperLinkDBLink_Type`),
  KEY `SiteLinkDBLink_Type` (`SiteLinkDBLink_Type`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Both fields HyperLink and SiteLink only add their own fields to the final DataObject structure.

The only overhead in the generated schema is an extra *DBLink_Type field for every Link. This means we only pay for what we use.

AnyLink

The suggested data model design (the Option C) is flexible enough to achieve AnyLink as a link type implementation on top of DBField\Link. See an example prototype implementation in the PoC - AnyLink.

The following Page implementation:

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'Link' => LinkField\ORM\DBField\AnyLink::class,
    ];
}

Produces the following data model:

Create Table: CREATE TABLE `Page` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `LinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\AnyLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\AnyLink',
  `LinkAnyLink_f2f0ebb1_LinkType` enum('SilverStripe\\LinkField\\ORM\\DBField\\Link','SilverStripe\\LinkField\\ORM\\DBField\\AnyLink','SilverStripe\\LinkField\\ORM\\DBField\\HyperLink','SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\Link',
  `LinkHyperLink_dba830f3_Title` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkHyperLink_dba830f3_URL` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkSiteTreeLink_09b5f94f_SiteTreeId` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `LinkDBLink_Type` (`LinkDBLink_Type`),
  KEY `LinkAnyLink_f2f0ebb1_LinkType` (`LinkAnyLink_f2f0ebb1_LinkType`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Thus, AnyLink is just another link type implementation that provides a composition of fields of all the other link type implementation and then can behave as a particular link type chosen by end users.

chillu commented 3 years ago

Had a catchup with Serge on this yesterday, and I agree with Option C. One tradeoff is that we can't leverage goerricoe/silverstripe-link as a data model baseline, and there's a less straightforward migration path from the ecosystem around this module. But they're all pretty straightforward implementations, so I'm not too worried.

Overall, the focus of this should be on creating a more user friendly UI (GridFields add too much friction). And to choose a data model which can support that.

chillu commented 3 years ago

Thanks for the updated Option C PoC Serge! Looking at the code, the DBCompositeField based structure seems to assume that the developer picks a certain link type upfront when defining their model. While that would keep down the amount of database columns, I reckon it's very much an edge case. In most cases, developers should be encouraged to give users access to all available link types in any context. For example, if a dev creates or installs a "linkable banner" block, the user by default should be able to insert links to files, pages, emails and external URLs. If a dev then creates a "DMS integration" with the ability to link to documents in that DMS, they shouldn't need to opt in to that everywhere a Link field is referenced in a DataObject, right?

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'Link' => LinkField\ORM\DBField\Link::class,
    ];
}

This would create the following database columns (roughly):

Link_Type (Enum of Hyperlink, SiteTree, File, Email)
Link_Title
Link_URL
Link_FileID
Link_SiteTreeID
Link_Email

Without a "container" class like this, I don't have faith in devs explicitly opting in to all potential link types. They'll just assume that a banner would only need links to pages, and once CMS authors identify a need to link to a PDF file it'll become an expensive code change.

dnsl48 commented 3 years ago

Thanks for the feedback, Ingo!

That's a valid concern. I have also considered the use case of choosing any link type, although I didn't add it to the initial PoC. The suggested data model design (the Option C) is flexible enough to achieve that via implementing another link type on top of DBField\Link.

I have just added a prototype of such implementation to the PoC. See the AnyLink dbfield.

use SilverStripe\LinkField;

class Page extends SiteTree
{
    private static $db = [
        'Link' => LinkField\ORM\DBField\AnyLink::class,
    ];
}

Produces the following data model:

  `LinkDBLink_Type` enum('SilverStripe\\LinkField\\ORM\\DBField\\AnyLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\AnyLink',
  `LinkAnyLink_f2f0ebb1_LinkType` enum('SilverStripe\\LinkField\\ORM\\DBField\\Link','SilverStripe\\LinkField\\ORM\\DBField\\AnyLink','SilverStripe\\LinkField\\ORM\\DBField\\HyperLink','SilverStripe\\LinkField\\ORM\\DBField\\SiteTreeLink') COLLATE utf8mb4_unicode_ci DEFAULT 'SilverStripe\\LinkField\\ORM\\DBField\\Link',
  `LinkHyperLink_dba830f3_Title` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkHyperLink_dba830f3_URL` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `LinkSiteTreeLink_09b5f94f_SiteTreeId` int(11) NOT NULL DEFAULT '0',
chillu commented 3 years ago

Yep that looks better. Can you explain why we need a specific LinkType (LinkAnyLink_f2f0ebb1_LinkType) in addition to the generic LinkDBLink_Type field? In my view, those aliases (AnyLink, HyperLink, SiteTreeLink) are enough to create the right PHP behaviour from the data without hardcoding the FQCN in the database.

So in your example above, the Page already knows that the Link composite field is of type LinkField\ORM\DBField\AnyLink and can create the right PHP instances with that data. The AnyLink instance can then infer the class implementations based on the aliases via PHP or YAML config: HyperLink uses an instance of SilverStripe\\LinkField\\ORM\\DBField\\HyperLink. If another module wants to subclass HyperLink to add more behaviour and use the existing alias, that would be kept in PHP or YAML config. I'm just very allergic to overly verbose database columns, we already have an insanely large database schema in most production sites.

dnsl48 commented 3 years ago

Can you explain why we need a specific LinkType (LinkAnyLink_f2f0ebb1_LinkType) in addition to the generic LinkDBLink_Type field?

Maybe we don't need it. That's just a prototype implementation to prove the concept. The final implementation may be better optimised. The current implementation with a separate field for a link type of AnyLink, however, ensures the data integrity on the DB level. It may prevent some logical errors in the application (e.g. you won't be able to assign a HyperLink instance to the Page.Link field and will have to always provide it with an instance of AnyLink, because it's of type enum('SilverStripe\\LinkField\\ORM\\DBField\\AnyLink')).

I believe it is possible to have either implementation later and we don't have to "incorporate" that decision in the Data Model RFC.

chillu commented 3 years ago

I believe it is possible to have either implementation later and we don't have to "incorporate" that decision in the Data Model RFC.

In my view, the purpose of a "data model" centric RFC is to work through the data model issues, right? And the verbose "just in case" type fields in the database are a big part of that. You are correct that on an ORM level, accessing Page->Link() in your example will always need to return you an AnyLink instance. But when you call Page->Link()->forTemplate() or Page->Link()->URL(), it would inspect the Page.LinkType column and return you a template for the right type (e.g. HyperLink).

This brings up a somewhat related templating issue: How do you expect that we handle potential additional HTML attributes required by this new object? The most natural template implementation is <a href=$Link.URL>custom content</a>, but that makes it hard to add make target or rel configurable by CMS authors. Maybe that's rare enough that we can ignore it?

dnsl48 commented 3 years ago

And the verbose "just in case" type fields in the database are a big part of that.

I feel that decision is not critical for this RFC, because both AnyType implementations may co-exist with the same data-model described in this RFC. Eventually, the users may decide which one to use or implement their own implementations addressing even more edgy cases in their projects. I agree that it is important to decide the default implementation that we ship with the module, but I'm wary that may increase the amounts of discussions and prevent this RFC from becoming accepted for a while. On the other hand, we could accept this RFC and keep these questions for later (discuss separately in other GitHub issues). Such approach would allow us to keep working on other stuff in parallel, while we deciding the ideal implementation of the default AnyLink implementation.

How do you expect that we handle potential additional HTML attributes required by this new object? The most natural template implementation is <a href=$Link.URL>custom content</a>, but that makes it hard to add make target or rel configurable by CMS authors.

The AnyLink prototype in the PoC addresses that via casting AnyLink to HyperLink first and then rendering HyperLink in the place where AnyLink is put within the template. That means developers would have control over HyperLink template rendering via its own templates and class implementation.

The section Front-end rendering options of the RFC provides 2 ways to render links:

That may depend on the particular link type implementation (e.g. devs can extend HyperLink with their own implementation and provide different templates or XML method implementation)

that makes it hard to add make target or rel configurable by CMS authors

I would suggest the CMS authors should be able to configure these via FormField in CMS when they choose the type of AnyLink to be HyperLink.

wilr commented 3 years ago

How do you expect that we handle potential additional HTML attributes required by this new object? The most natural template implementation is custom content, but that makes it hard to add make target or rel configurable by CMS authors.

I imagine the base link model could follow the FormField API, getAttribute, setAttribute and in the templates provide a $AttributesHTML in the default template. That way people could inject / extend whatever attributes they wanted (tabindex, aria etc)

maxime-rainville commented 3 years ago

Had a glance at the RFC PR. I don't have major beef with it.

Developer experience

My main concern is making sure it's easy and straight forward for dev to define links on their data objects. I'm thinking we could create a Link alias for AnyLink like we do with other DBFiled.

So people could just define their links with:

private static $db = [
    'MyLink' => 'Link'
];

Or we could rename Link to something like AbstractLink' and renameAnyLinkto justLink`.

Basically, the average dev shouldn't have to invest a lot of time to understand the gut of the module to use it.

Type concerns

My gut feeling would be to have to keep a FQCN just to avoid any potential collision later on.

Other concerns

All the other concerns raised about templating and additional link attributes are secondary and can be handled later on. We're pretty far from shipping a stable product. If we make a bad decision we can come back and fix it later.

dnsl48 commented 3 years ago

We identified the following technical concerns regarding the PoC AnyLink implementation:

It aggregates database fields from every link type in the application. If we have 10 link types registered in the application, every one of those would have 3 fields (in average), the AnyLink would create 30 database fields (even though only 3 of those would be used for every particular link type). Multiplying that by several links for a data object makes that very inefficient. E.g. Page has 5 links, 30 db fields by each would produce 150 database fields for that page.

We are going to work on an optimised AnyLink implementation in the following issue: #31