silverstripe / silverstripe-framework

Silverstripe Framework, the MVC framework that powers Silverstripe CMS
https://www.silverstripe.org
BSD 3-Clause "New" or "Revised" License
721 stars 821 forks source link

[ORM] Default values set in $db can not be relied upon for sub-class tables #4700

Open jonom opened 8 years ago

jonom commented 8 years ago

The "Default values for new database columns" functionality documented here can not be relied upon if the field is being introduced to a sub-class table instead of the base table.

For example, when introducing a new database property to Image, a default value specified in the field constructor within $db will not be applied to existing records which only exist in the File table and not the Image table.

In the case of Image, this issue arises because a given Image object must have a record in the base File table, but it may or may not have a record in the Image table (assuming an Image table exists). At the time of introducing a new field to the Image table and building the database, only Image objects that already have a record in the Image table will receive the default value specified in the constructor in the $db array.

The next time an Image is saved (write() called) the specified default value may be applied, but until that time a null value will be returned for the newly introduced field when reading the object. Also, if write() is called from a from a form interface the default probably won't be applied at this point because the form will have been pre-populated with a null value and it will try to save this in to the object.

In contrast, if you introduce a new field to File, all Image objects will receive the default value specified in $db at build time because all Image objects have a record in the File table.

Note that we're talking about the $db array here, not the $defaults array. Also File and Image is probably a bad example to use because it's all been re-worked in master... I've described the issue as it applies to 3.x


WAS:

I'm just upgrading my boilerplate project and noticed I was having trouble with Images. A bunch of them seemed to be behaving only as Files all of a sudden. I checked the database and while there were lots of records in the File table with a ClassName of Image, there were only a handful of records in the Image table.

There should be a record in the Image table for each ClassName=Image record in the File table right?

To test I deleted the Image table by removing the extension that's adding properties to Image, and rebuilt the database. Then I created a simple extension to add a $db property to Image, and rebuilt again. An Image table is built but it is empty. Image::get() comes up empty.

Anyone know what's going on here? I'm going to be away for a few days but here are some test files if anyone wants to investigate: http://cl.ly/402J242f3D10/ImageTableTest.zip

tractorcow commented 8 years ago

Try Debug::dump(Image::get()->sql()); to see what type of query it's doing.

sminnee commented 8 years ago

The queries are usually constructed such that the Image entry is optional, and if missing, all its records are returned a null. The query should be along the lines of SELECT * FROM File LEFT JOIN Image WHRE File.ClassName = 'Image';

jonom commented 8 years ago

Sorry, I was wrong about Image::get() failing, but I think I've uncovered 2 separate ORM bugs. I tested with File/Image and using in example below but I assume this applies to all sub-classes:

  1. Image::get()->byID() will return false if the record only exists in the File table and not the Image table
  2. When introducing new $db properties to Image, default values will not be applied to existing records if the record only exists in the File table and not the Image table. Example: 'TestImageProperty' => 'Boolean(true)'

Item 1 could perhaps be fixed by updating the byID() method and likewise for item 2 we might be able to modify the ORM somehow to ensure default values are returned where records don't exist, but it seems to me the easiest and most transparent solution would be to ensure that when the Image table is built, a record is added for every Image record that exists in the File table (and likewise for all sub-class tables).

@sminnee @tractorcow What do you guys think?

jonom commented 8 years ago

Here is a little module you can drop in for testing this on 3.2. http://cl.ly/1H330z2Z3740/download/table-test.zip

Contains a BuildTask with output like this: screen shot 2015-10-26 at 2 12 35 pm

sminnee commented 8 years ago

Hey guys, this is an important bug to fix in 3.2.1. The underlying issue is that the query in 3.2 of the form:

SELECT * FROM File LEFT JOIN Image WHRE File.ClassName = 'Image' AND Image.ID = 3

When it should be:

SELECT * FROM File LEFT JOIN Image WHRE File.ClassName = 'Image' AND File.ID = 3

We need to filter ID by the base class.

jonom commented 8 years ago

@sminnee any thoughts on the second bug?

tractorcow commented 8 years ago

Fix inc for this. :)

tractorcow commented 8 years ago

Fix at https://github.com/silverstripe/silverstripe-framework/pull/4725

jonom commented 8 years ago

Thanks @tractorcow! I've tried your fix locally and appears to fix bug no. 1 but not bug no. 2.

screen shot 2015-10-30 at 9 39 34 am

If you drop in that little module above on a fresh SS install, build the DB, then go to /dev/tasks/TableTestTask you should see output like above... probably with only one image though I guess.

What's happening is that File->TestFileProperty is having a default value correctly applied to existing records at the time the database is built but Image->TestImageProperty is not because no records exist in the Image table.

sminnee commented 8 years ago

OK I've pulled this out of the 3.2.1 milestone as it's not a regression and therefore not a blocker. Renamed issue to focus only on the outstanding bug.

sminnee commented 8 years ago

There are a few cases, that will probably require separate treatment:

  1. Add a field to a subclass that didn't previously have any fields. The table should have records added to it for any existing records.
  2. Add a field to a subclass that already had fields (so the table already exists). The new column should be created with the correct default value. This probably already happens
  3. Add a field that as a private static $defaults entry. New columns will need to have the default value set for existing records.
jonom commented 8 years ago

3) This would be cool and makes sense but probably something for master - currently $defaults only affects newly instantiated objects so increasing its scope might be a breaking change?

4) Upgrade task or automatic record repair on build: Continuing from (1) - Where sub-class tables already exist prior to a fix for this issue they likely won't contain a complete set of records currently so we'll want to provide a way of filling in the missing rows. If it's not a performance problem it would be great if we could do this automatically so missing records are created in sub-class tables and orphaned records deleted every time the database is rebuilt.

tractorcow commented 8 years ago

Could we automatically build missing-subtable detection into the lazyloading feature? If a multi-table dataobject is lazy-loaded, and the lazy loaded table is missing (if this record is saved of course) then it lazy-loads from the db schema default?

Pseudocode:

function lazyLoadField($table, $field) {
    $result = $this->queryForTable($table);
    if($result['ID']) {
        return $result[$field];
    }
    else 
    {
        return $this->dbObject($field)->dbDefault();
    }
}
jonom commented 8 years ago

I can't say for sure (don't know enough about the ORM) but it sounds like a good approach to me for addressing (1) and (4) :)

tractorcow commented 8 years ago

On 2. I think sam is right about it already setting the correct default when creating a field. :)

jonom commented 8 years ago

:+1: Yes I think Sam's right on (2) too, but with the caveat that the default will only be applied to records that already exist in the sub-class table, and some may be missing. Addressing (4) will take care of that so the approach you proposed should cover it!

jonom commented 8 years ago

Reopening as https://github.com/silverstripe/silverstripe-framework/commit/2813f94124c2ba14f1e4a51001e3898b0e0c32aa only fixed the first bug right?

jonom commented 8 years ago

p.s. Thanks for the fix Damian!

dhensby commented 8 years ago

Status update?

jonom commented 8 years ago

I think it might be best to fix this on master... think the basic problem is that we haven't properly considered how defaults should be applied to existing records when a new field is introduced to a model. It's an oversight we really should fix I think as it can easily lead to inconsistent/unexpected data.

Different options for setting default values:

The only time I set a default value within a field definition in $db is when I want to apply that value to existing records - otherwise it behaves the same as setting a value for the field in $defaults. It's a useful feature but it feels like a bit of a hack and as outlined further up the comment thread it's buggy at the moment for sub-class tables that have only a partial set of records.

What defaults should be applied to existing records when a new field is introduced to a model?

I figure we have three options:

  1. Explicitly allow the setting of a default value on existing records when you introduce a new field on a model.
  2. Apply the value in $defaults to existing records when a field is introduced. I'm not sure this is a good idea as you may want historical records to have a null value?
  3. Remove the ability to set a default value on existing records. It would be up to a developer to retroactively update existing records in that case.

I'm not sure 1. is a good idea or not because it seems fairly fragile and only comes in handy once - the first time you build the database after introducing a new field. 3 seems fairly extreme but maybe offers maximum data integrity and predictability. 2 might be the least confusing / most pragmatic solution?

Either way I think reducing or removing the overlap in functionality between setting defaults in $db and $defaults would be a good idea. For option 2 or 3 we could perhaps remove the ability to set a default value in $db?

If we do 1 or 2 we'll still have to fix the bug where records are only receiving the default value if they already exist in the sub-class table. One of @tractorcow's suggestions to get around this was to dynamically assign a default value from the schema when a sub-class record doesn't exist, but my concern is that if at some point you change what you want the default value to be for new values you're going to effectively modify data on all of the existing records that have a base-class record but lack a sub-class one. So I think the more robust solution would be to ensure records always exist in all applicable sub-class tables, not just the base table.

tractorcow commented 8 years ago

How about making the schema generation for the database read the Model.defaults config to populate the DB level defaults, instead of forcing it to be declared on the $db variable?

jonom commented 8 years ago

How about making the schema generation for the database read the Model.defaults config to populate the DB level defaults, instead of forcing it to be declared on the $db variable?

That's option 2 right that I described in the previous comment? I provided a few pros and cons above.

tractorcow commented 8 years ago

Ah right, sorry. :D I thought you ment via a SQL update, rather than setting the schema default.

if at some point you change what you want the default value to be for new values you're going to effectively modify data on all of the existing records that have a base-class record but lack a sub-class one.

If they are missing a sub-class table, then you wouldn't have a previous default to worry about would you, if the data isn't even there?: P

jonom commented 8 years ago

The problem is that for all intents and purposes if you assign a default value on the fly the record would have data for that field. So if someone is viewing an object that has a record on the base class table but not one on the sub-class table it would appear to them as if that field has a value. If they come back to the same record after changing the default value, they would see a different value there. So as far as the CMS user is concerned data corruption has occurred - they don't know that a sub-class record doesn't exist yet and they shouldn't need to know that. Until they trigger a write() on that object the sub-class record won't exist so the value won't actually be committed. In contrast if a different record happened to have a sub-class record when the field was introduced the value would not change when the default changed. I don't think this inconsistency is acceptable.

tractorcow commented 8 years ago

Could you make it so that lazy-loading loads the DB default on join, instead of null, if a table is missing a row?

jonom commented 8 years ago

I don't understand the difference sorry - but if you're doing any kind of lazy-loading won't you still have that problem where the default value isn't committed to the record so is subject to change if you change the default value?

tractorcow commented 8 years ago

Given that ChildTable.Description has a $defaults value 'DBDefault' you could make lazy-loading to this query.

SELECT
    BaseTable.ID,
    BaseTable.Title,
    IIF(ISNULL(ChildTable.ID), 'DBDefault', ChildTable.Description)
FROM
    BaseTable
LEFT JOIN
    ChildTable ON BaseTable.ID = ChildTable.ID

So ugly though....

tractorcow commented 8 years ago

IIF(ISNULL(ChildTable.ID), 'DBDefault', ChildTable.Description)

This is why I wish there was a SQL expression for "db default value for this column"

IIF(ISNULL(ChildTable.ID), DEFVALUE(ChildTable.Description), ChildTable.Description)
jonom commented 8 years ago

I think I follow what you're saying @tractorcow but aren't you just describing how lazy-loading could work and ignoring the concern I have about lazy-loading? Do you think my concern is valid? What I'm saying is if you use lazy-loading to fill the gap when a record is missing a sub-class record, the value you retrieve for these objects (specifically the ones lacking sub-table records) is subject to change. If you ever change the default value stored in $defaults a different value will be lazy-loaded for that record than was before. Do you follow - does that make sense?

tractorcow commented 8 years ago

Yes sorry to not address your concern directly... my feeling is that if there is a missing table, there is no "promise" for values of those missing fields until the record is re-saved and the missing row(s) regenerated. It would be a recoverable error situation, in that case.

You could "fix" it by having a task that automatically regenerates all missing rows, although it feels riskier still to intrinsically promise that it's the responsibility of the framework to fill in these missing values, since there's no way to guarantee it's regenerated properly before those values could be needed in the future.

You could possibly address this during schema changes (e.g. new tables being created), but it would be potentially risky to attempt data migration during build, and it would still not fix records which were missing certain rows for OTHER reasons (of which I can guess there are a few). Do you really want to iterate and save every record if the schema is changed? Out of memory risk there. :)

Maybe this is another problem that could be solved via documentation? E.g. "If you add new tables for existing records, you need to re-save or migrate that data manually during deployment".

jonom commented 8 years ago

If you read https://docs.silverstripe.org/en/4.0/developer_guides/model/data_types_and_casting/#default-values-for-new-database-columns the functionality described kind of does make a promise that you can add a default value to a new field for all existing records - not just the portion that happen to have a record in the sub-class table. That might just be bad documentation though and my fault since I added it! But that was my interpretation of what is supposed to happen. Currently the functionality described in the documentation is extremely unreliable when dealing with sub-classes (it's broken), it only works 100% reliably on base tables where it is guaranteed that a record exists for every object. So I think we either have to repair the functionality and ensure it operates in a consistent way whether you're working on a base-class table or a sub-class one, or completely remove the ability to assign default values to existing records when a field is introduced for the first time. I don't mind which way we do things but I think it's important to be consistent and ensure a predictable result.

To use Image as an example, the problem for me is that if you have 80 Image objects in your database, you will have exactly 80 records in the File table of ClassName=Image, but if an Image table exists, you will have anywhere from 0 to 80 records in it. I think this is a failing of the ORM and there should be exactly 80 records in the Image table in this scenario.

Once you have an Image table (because you've added some additional $db properties to the Image class) any new Image objects you create after that point will have a record in both the File table and the Image table. But Image objects created before that point will only get a record in the Image table once they are re-saved. To me this means those older Image records are incomplete, and is the root cause of the currently broken functionality which allows you to assign a default value to records upon build. I think at the point the Image table is introduced (during the build process), it should be back-filled with a record corresponding to every record in the File table with ClassName=Image, and then the 'build defaults' should be applied. This shouldn't be done with a full write() but lower level SQL. To me this is the only way to get a 100% consistent result when introducing new fields on direct and indirect descendants of DataObject.

dhensby commented 8 years ago

So I'm clear on this, the problem is something like this:

If we have two classes:


class MyDO extends DataObject {

    private static $db = array(
        'Title' => 'Varchar',
    );

}

// note: no DB fields, so no DB table
class MyExtDO extends MyDO {
}

We create some MyExtDO objects and they get written to the base table.

We then add a DB field to MyExtDO with some $defaults eg:

...
class MyExtDo extends MyDO {

    private static $db = array(
         'Content' => 'Text',
    );

    private static $defaults = array(
        'Content' => 'Default Content',
    );

}

We build the DB now and a new table is added but records in the base table aren't added to the new descendant table.

When we load a MyExtDO out of the DB it'll show a value of "Default Content" from the DB even though there's no record? - Is this current behaviour?

If we don't write, this value will never be stored but will be presented to users. If we then change the default value, the items that never got written but did show as having that content will now show having the new default? - Is this also current behaviour?

If the above is correct, my view is that, if there's no record in the table, the value should not take the default, it should be empty. This is how it would work if you added a new field to an existing table with existing records and so shouldn't change just because we happen to now have a new DB table.

jonom commented 8 years ago

We build the DB now and a new table is added but records in the base table aren't added to the new descendant table.

Correct.

When we load a MyExtDO out of the DB it'll show a value of "Default Content" from the DB even though there's no record? - Is this current behaviour?

No this is not current behaviour. Just one suggestion on a way forward. Currently it is possible to set a default value on existing records at the time a new field is introduced by declaring a default value in the $db array instead of $defaults e.g. $db = array('TestBool' => 'Boolean(true)'). However this is currently limited to fields which allow passing through a default value in the constructor, and may not work properly on sub-class tables as not all records are guaranteed to be present.

If we don't write, this value will never be stored but will be presented to users. If we then change the default value, the items that never got written but did show as having that content will now show having the new default? - Is this also current behaviour?

Nope this is not the current behaviour, but this is my concern about what would happen if we did opt to lazy-load default values instead of fix the root problem (in my eyes) which is the sub-class table having an incomplete set of records.

If you want to really wrap your head around the problem check out this module https://github.com/jonom/temp-ss-default-test in to a SilverStripe website that has some image records on the File table but doesn't have an Image table yet. Build the db and then run the included dev task. You should see output a bit like this, comparing the difference in values between a new database field introduced to File (base-class) and Image (sub-class), with default values specified where allowed. screen shot 2016-04-06 at 11 01 09 am

The green values match because no default was specified in $db as the constructors for those db fields don't permit it.

Everything works as expected on File for existing records, but fails dismally on Image for existing records, because there is an Image table at this point but it doesn't contain any records. For any new records you create from this point, the $db defaults will work fine on both File and Image. Interestingly, if you open a CMS edit form and re-save an existing record at this point any null values will be updated with the defaults - as long as there is no form field present for the new property. But if a field is present it will appear blank, and upon save it will be converted to whatever null is supposed to be for that field... e.g. 0 for a decimal field.

So currently we have this 'feature' which doesn't work consistently across base classes and descendants... that's really the only problem. Maybe it's not a big deal and I'm being pedantic, but it's tripped me up before. I don't think developers should have to treat each case differently and have knowledge about this quirk in the ORM to get a predictable result. So if we can't get this feature working consistently and predictably I feel like it should be removed.

'Default values for new objects' works great but the concept of 'default values for new fields on existing objects' just hasn't been properly considered in my opinion. We can set defaults at build time for some field types but not others, and it doesn't work properly for descendant tables. Throw in inconsistencies between how form fields process null values and DBFields do and it's all a bit messy at the moment.

dhensby commented 8 years ago

OK - well, then I stick by this:

if there's no record in the table, the value should not take the default, it should be empty. This is how it would work if you added a new field to an existing table with existing records and so shouldn't change just because we happen to now have a new DB table.

And we need to get it working like that

jonom commented 8 years ago

This is how it would work if you added a new field to an existing table with existing records and so shouldn't change just because we happen to now have a new DB table.

That's not how it works though... if you've specified a default in the $db array when you add a new field to an existing table, all the existing records in any existing tables receive the default value. That's what happens in the File base-class table example I supplied above, in contrast to the Image sub-class table example.

I don't think I've been very successful at articulating the concepts here... I've edited the leading comment now to try to summarise it better. :stuck_out_tongue:

dhensby commented 8 years ago

ahhh, so this is about using defaults passed to the $db definititon and not using $defaults.

OK - well assigning defaults in this way is only limited to a few DBField types and I think something we probably should avoid having an inconsistent API for.

I'd be in favour of deprecating the default argument to Boolean and any other DBFields that have them and move that to be managed solely by $defaults.

Sorry, the thread is quite long and my time quite short so I'm skimming when I should be reading in more depth.

jonom commented 8 years ago

All good, this issue has mutated over time with two separate bugs identified and only one remaining now, hard to follow I know.

I'd be in favour of deprecating the default argument to Boolean and any other DBFields that have them and move that to be managed solely by $defaults.

So as I understand it at the moment, values specified in $defaults are only applied to brand new objects when they are created, before being written to the database. It doesn't currently affect the actual database structure.

Whereas passing a default argument in a constructor in $db affects the structure of the column in the database and default used at the database level. I don't know if this has any practical effect due to the way the ORM works but it's interesting.

So if we drop support for setting custom default values in field constructors, would we instead use whatever is specified in $defaults as the database-level default value? Or would we force whatever default value is defined on the DBField in question - i.e. false for Boolean?

Anyway, doing that might reduce cross-over and confusion but it won't fix the underlying problem, which is that default values (however they're arrived at) aren't being applied to or read from objects that don't have a record in the sub-class table the field appears on. Many field types have default non-null values so even if we don't allow people to specify a custom default value, we want to ensure the regular default kicks in.

Take Currency for example. In the database NULL is not allowed and if you don't specify your own default the field will get a default value of 0.00. But you can see in test results in a previous comment that you get a NULL value when reading a currency field from an Image object that doesn't have a record in the Image table. A null value shouldn't be possible here.

@tractorcow's suggestion to combat this was to lazy-load in the default value when a record doesn't exist, so in the case of Currency you would get 0.00 (or whatever default you specified in the constructor) instead of NULL. That seems fairly solid, and certainly a lot better than what happens currently. My only concern is that that value isn't actually committed to the object until you call write() and generate a record for the sub-class table, and if you ever change the default value then the perceived value of that field on any objects that lack a sub-class record where the field is defined will change.

So if you'd set a default price on a Ticket object at $5, you might have a bunch of objects that appear to have a price of $5 and you're happy with that, but unbeknownst to you only half the objects have a record on the table that defines the price, the rest are being filled in on the fly. So when you change the default to $10 suddenly half of those records have their price changed from $5 to $10.

Maybe it's a silly concern and could be resolved with documentation - e.g. "once you set a default in your constructor in $db don't change it as it could affect the value of this field on existing objects". Just seems unintuitive to me and wouldn't be a problem if we could guarantee a full set of records in all sub-class tables.

tractorcow commented 8 years ago

My only concern is that that value isn't actually committed to the object until you call write()

Maybe a pre-flight check BEFORE lazy-loading that writes missing rows first? That way the lazy loading itself doesn't have to do anything special. It could be written in such a way that it detects missing rows without extra queries, so the performance hit only occurs during this exception (where there are missing rows in subtables, but not the root table).

jonom commented 8 years ago

Maybe a pre-flight check BEFORE lazy-loading that writes missing rows first?

That sounds like a great idea to me.

tractorcow commented 8 years ago

I guess we could have an option to enable it, in case older sites are more concerned about performance than stability. ;P

sminnee commented 8 years ago

Maybe a pre-flight check BEFORE lazy-loading that writes missing rows first?

Do you mean something in dev/build? That makes sense to me. The whole of SilverStripe is built on the assumption that if something goes wrong that would be fixed by a dev/build, then the fix is to run a dev/build.

A dev/build-based fix should be okay as long as these issues can only be introduced by a code-changes, not by regular usage of the ORM.

That way the lazy loading itself doesn't have to do anything special.

Yeah, the ORM is already too slow. We should avoid too many more complex IF() clauses in our queries.

dhensby commented 8 years ago

So basically... Add the rows on dev/build?

tractorcow commented 8 years ago

Do you mean something in dev/build? That makes sense to me. The whole of SilverStripe is built on the assumption that if something goes wrong that would be fixed by a dev/build, then the fix is to run a dev/build.

No, lazy-loading is done when the object is queried, and we don't normally iterate over all objects in dev/build.

tractorcow commented 8 years ago

I'm not attempting to fix THIS issue, but I've posted https://github.com/silverstripe/silverstripe-framework/pull/6005 which allows you to more easily specify db level defaults for db fields.

chillu commented 5 years ago

Sam rolled this up into a more generic issue: https://github.com/silverstripe/silverstripe-framework/issues/8908