silverstripe / silverstripe-framework

Silverstripe Framework, the MVC framework that powers Silverstripe CMS
https://www.silverstripe.org
BSD 3-Clause "New" or "Revised" License
721 stars 822 forks source link

RFC Non-class backed (yml only) dataobjects #7902

Open tractorcow opened 6 years ago

tractorcow commented 6 years ago

Background

This RFC proposes a solution to allow a new dataobject specification to be built using only config.

This proposal will allow us to split the class base for specific dataobject specifications away from requiring explicit classes.

This means that, for instance, we can create test class dataobjects which do not require us to create a test file. These can exist purely in YML.

Details

Class structure

I propose an extension of the DataObjectSchema implementation. Ever since I wrote this feature I've felt that the implementation was out of place and served no real purpose, other than "make dataobject.php shorter".

The basic proposal is that each logical-class will be backed by two real-classes:

As with Page and PageController, the relation between the two will be based on naming convention. The default "schema" class for any model is <FQN>Schema. YML classes don't need to conform to this convention.

Schema registration

The registration of all models will be based on the union of:

That meaning, creating a subclass of DataObjectSchema will automatically register that as a model. These classes are self-describing, meaning that config such as db, has_one, and so on will be drawn not from the BlogPage.db, but from BlogPageSchema.db. In fact, BlogPage class may not even exist.

YML Registration for objects would be:

SilverStripe\ORM\DataObjectSchema:
  schemas:
    BlogPage: # Virtual class
      extends: Page # Virtual parent class. Maybe not necessary?
      db: # will be merged with PageSchema.db too.
        Description: Varchar
      has_one:
        Author: 'SilverStripe\Security\Member'
      schema: PageSchema # Parent schema class name. Acts as both schema object, and inherits db fields from that schema
      instance: Page # Use the real `Page` class for instances of this object

schema and instance would just default to DataObjectSchema and DataObject if omitted, so they aren't mandatory.

We could instead make the schema / instance default to the parent extends model.

Parent db / has_one etc would inherit from EITHER (but probably not both) of extends or schema. That may require some thinking. :)

Typical convention would be that, for virtual schemas, there is no namespace.

Schema retrieval

Similar to how DataObject::get()/::get_one() can be used to query instances of a dataobject, you can use DataObjectSchema::get_one($model) to get a schema for a named model.

Mosts tasks that previously used DataObject::singleton() to do things on the schema (such as build the DB) now use DataObjectSchema::get_one($class).

DataObjectSchema::get() will return all schemas for the current db.

API

class DataObject {
    // Inject schema for objects. Required for virtual classes
    public function __construct(DataObjectSchema $schema = null) {
        $this->schema = $schema ?: DataObjectSchema::get_one(static::class);
    }
    public function getSchemaName();  // name of model. for non-virtual classes this ==== static::class. Same as DataObjectSchema::getName().
    public function getSchema(); // Changed from static to instance method, returns the schema for this instance
    public function write(); // other instance-only methods, etc
}
class DataObjectSchema {
    // configs (example)
    private static $db = [];
    private static $has_one = [];
    private static $has_many = [];

    // props
    public function getName(); // name of model. for non-virtual classes this ==== getInstanceClass(). Same as DataObject::getSchemaName().

    // Instance interaction
    public function getClass(); // class name of instance used
    public function createInstance(); // create new instance class for this schema
    public function query(); // create DataQuery for the underlying model

    // Schema interaction (called on DataObjectSchema:: directly).
    public static function get(); // Get all schemas
    public static function get_one($name); // Get a named schema (name of model not schema class)

    // db scaffolding, moved away from DataObject::singleton()
    public function requireDefaultRecords();
    public function requireTable();
}

Options / variants

Since we are moving the config for model specification to DataObjectSchema, it may be less disruptive instead to keep the schema on DataObject, and make that the schema class (DataObjectSchema goes away).

We would instead have a DataObjectInstance (or just DataItem for short).

tractorcow commented 6 years ago

cc @silverstripe/core-team for review :D

tractorcow commented 6 years ago

I guess the major question is here, which classes should have which roles for schema + instance:

And the other question is, is shifting config off DataObject subclass ok? Most people are used to adding it to their model + using it straight away. It may be a pain to users to need 2 classes.

unclecheese commented 6 years ago

This is great. Thanks, @tractorcow. I don't have a lot of time to write up a full response now, but in general, I love the direction. A long time ago, I attempted to solve this problem in a similar, but much less complex way by having a universal DataObject extension that acquired class-specific schema and CMS fields from YAML. In my experience building moderately complex websites, anywhere between 50-80% of DataObjects required nothing beyond YAML declarations, and that was really valuable, particularly for getting a site going quickly, and also making it accessible to designers and other team members who lacked PHP proficiency.

Architecturally, I think I have a hard time opposing any well-engineered plan to deconstruct the DataObject monolith into a more aspect-oriented approach, with smaller bodies of single concerns. Decoupling the schema from the business logic seems like a great start.

A declarative way to do getCMSFields and exposure to ModelAdmin seem like really good ways to reduce boilerplate code, as well, but that's another RFC.

Anyway, very keen to hear from others on this!

robbieaverill commented 6 years ago

This means that, for instance, we can create test class dataobjects which do not require us to create a test file. These can exist purely in YML.

I'm really excited about what this will mean for creating fixtures! SapphireTest performance with fixtures is another topic, but it would help so much if we could fixture say a Member with only YAML.

extends: Page # Virtual parent class. Maybe not necessary?

Perhaps if we called it inherit (or even inherits for multiple) it might not draw the assumption attached to extends in PHP? You'd be kind of doing this with schema: PageSchema already though.

In my experience building moderately complex websites, anywhere between 50-80% of DataObjects required nothing beyond YAML declarations

Yep, this is so true


Re: the instantiation of virtual classes via DataObjectSchema::get_one($name) etc, I kind of feel like we should be promoting some sort of dependency injection as a higher priority in SS5 - maybe implementing something like this into Injector would be a good start? For example, something like Injector has getItems($name) : DataObjectSchema[] and getItem($name) : DataObjectSchema

tractorcow commented 6 years ago

I had a bit of a think about this last night; I think creating two classes for a dataobject is unnecessary.

So I propose:

sminnee commented 6 years ago

I like this path.

At a high level, I'd see it like so:

Not sure:

I'm not entirely sure that this needs to be change/major - it could potentially be a minor change if we preserve the existing API.

sminnee commented 6 years ago

I'd probably call the interface DataType and make it completely absent of static methods.

You might have ClassBasedDataType which derives a data type from DataObject static properties, preserving current behaviour.

foreach(ClassInfo::subclassesFor(DataObject::class) as $dataClass) {
  $types[$dataClass] = new ClassBasedDataType($dataClass);
}

DataObjectSchema would probably be deprecated and/or rewritten to pass all its calls through to methods on the type object in question. Something like this (which assumes each type is a prefixed name in Injector; there are other approaches such as a DataTypeSet service)

    public function DataObject::tableName($class)
    {
        return Injector::inst()->get('DataType.' . $class)->tableName();
    }

This would keep us from breaking the DataObjectSchema API.

You could add ConfiguredDataType as a separate implementation, that used property setters rather than statics:

$configurableDataType = new ConfiguredDataType();
$configurableDataType->setTableName('orders');
$configurableDataType->setFields([
  'OrderID' => 'Int',
  'Price' => 'Decimal',
]);
$configurableDataType->setHasOne([
  'Member' = Member::class,
]);
$types['Orders'] = $configurableDataType;

...which can then easily be built inside Injector yaml.

sminnee commented 6 years ago

Just going to point out that if you can do this as a minor change, it gets into most of the community's hands within months rather than years.

flamerohr commented 6 years ago

I'd be interested to see how this could fit in a minor change release, it seems like a big paradigm shift at face value.

The concept is really powerful and it's really nice to see. I will greatly miss being able to easily discover and traverse classes in IDEs...

Looks like we'll utilise FormFactories more with this approach coming up.

sminnee commented 6 years ago

I like the idea of making big changes by making a new API rather than radically changing an existing one.

Then you can deprecate the old API...

And in the next major release delete it.

Easier said than done in some cases but a good principle to try for.

unclecheese commented 6 years ago

gets into most of the community's hands within months rather than years.

SS5 is only nine months away, if we're going to keep ourselves honest. By the time this gets into a minor release, it would likely be less than six months away, so I would weigh the cost/benefits of not breaking any APIs.

I will greatly miss being able to easily discover and traverse classes in IDEs...

Yeah, this is a huge drawback. I'd expect the natural outcome would be 100+ line, all-inclusive YAML files, which actually becomes a disservice to discovery.

flamerohr commented 6 years ago

I like the idea of making big changes by making a new API rather than radically changing an existing one.

Yup, that sounds fine, as long as old API is not dramatically changed to transition the new API in, then that could be alright.
It's not uncommon to overlook the old API functionality when the new API is transitioned in though.

100+ line, all-inclusive YAML files

I hope it doesn't lead to practices of monolith YAML file(s), that would be worse :P

sminnee commented 6 years ago

On the "100 lines of YAML, no more code hinting" risk: my assumption is that most DataObjects would still use the original PHP approach, and this tool would only be used where there were no special features of the object.

A particular use-case I could see for this is to allow the creation of new page types with no extra PHP code. You could, for example, build on this with a feature in a theme that injected new page types, meaning that a theme could provide a new page types independently. It's beyond the scope of this ticket but it's a way it could grow.

robbieaverill commented 6 years ago

It's probably worth noting that allowing this use case would also mean we're being a bit more honest with our API. We currently have DataObjectInterface which is useless outside of DataObjects, so this would be a good example of us dogfooding our interfaces a bit more.

At the same time it's also a great excuse to do some refactoring in DataObject =)

mooror commented 6 years ago

I thought I would put in my own two sense on the matter even though I am not a core contributor. Overall I think tractorcow's original idea is great and will help developers who are writing tests immensely.

I do however think that you may want to create a dedicated configuration file (model.yml or schema.yml perhaps?) file. Otherwise, I think this change might cause some consistency issues for developers coming to a new project or module.

Right now developers can reasonably assume that a project or module will store it's DataObjects files in a Model folder (under src/). It would be nice if you could store the yaml file for "Virtual DataObjects" within the same folder.

There is also the question of whether storing "Virtual DataObjects" in the main configuration file is appropriate for anything other than testing purposes.

P.S. I don't know if issue threads are normally open to community feedback so I hope I am not speaking out of turn here. Please take my comment as a grain of salt.

kinglozzer commented 6 years ago

P.S. I don't know if issue threads are normally open to community feedback so I hope I am not speaking out of turn here.

Not at all, we always welcome community feedback - on issues, RFCs and pull requests 😄

dhensby commented 6 years ago

This might be trickier than it seems. I've looked into adding "mock" dataobjects from PHPUnit tests into the manifest so that we could do proper testing of mocked dataobjects and having them show up in inheritance heirachies and such, but the nesting feature of the manifests doesn't appear to work as expected... So that needs fixing to get this working, I think..