yiisoft / yii2

Yii 2: The Fast, Secure and Professional PHP Framework
http://www.yiiframework.com
BSD 3-Clause "New" or "Revised" License
14.23k stars 6.91k forks source link

The engine should not attempt to force synchronization between relations and corresponding foreign keys #19788

Open PowerGamer1 opened 1 year ago

PowerGamer1 commented 1 year ago

Since PHP 8.2 has deprecated dynamic properties I had to switch to storing relation data in ActiveRecord model using ActiveRecord::populateRelation() method. This lead to surfacing of surprising problems in my code the root cause of which was tracked to a feature introduced in https://github.com/yiisoft/yii2/issues/13618. While attempting to make a fix in https://github.com/yiisoft/yii2/issues/19785 I studied and tested the implementation of #13618 extensively and provide my findings and conclusion below.

The concept.

Yii2 allows to define foreign key dependencies between corresponding database tables and conveniently fetch data from related tables using lazy or eager loading.

Before the #13618, once the relation was loaded the engine never attempted to synchronize the values of foreign key in ActiveRecord model with the data of corresponding relation. EXCEPT a very single isolated case which provided such rudimentary synchronization: calling ActiveRecord::refresh(). That call simply unset all loaded relations (if used again the relations would be lazy loaded from DB and if they changed the user will get the updated data). This mechanic already had negative consequences in some cases, for example:

$transaction = app()->getDb()->beginTransaction();
$project = Project::find()->where(['id' => 123])->with('manager')->one();
$project->name = 'New name';
$project->updated_at = new \yii\db\Expression('CURRENT_TIMESTAMP');
$project->update(false, ['name', 'updated_at']);
$project->refresh(); // Fetch the value of $project->updated_at generated by DB.
$transaction->commit();
// Now lets notify $project->manager that project name has changed.
var_dump($project->isRelationPopulated('manager')); // false - the relation is unset ???
// Why is that - relation data didn't change! The engine FORCES me to run ANOTHER query into DB to fetch the SAME data AGAIN!
// And this time OUTSIDE the carefully placed transaction!

Later, the functionality of #13618 (referred to as "magic" below) attempted to introduce proper automatic synchronization between the foreign key values of a model and corresponding relations in all other cases.

$project = Project::findOne(123);
$project->manager; // Load and use relation.
$project->manager_id = $newManagerId; // Change the value of ID to a different one.

// BEFORE the "magic":
var_dump($project->manager->id == $project->manager_id); // false - relation still has its originally loaded data.

// AFTER the "magic":
var_dump($project->manager->id == $project->manager_id); // true - relation has new data - the "magic" works!

The analysis.

Unfortunately, the implementation of the "magic" has failed miserably in its goal and a few instances where it actually works come with significant performance and memory price to ActiveRecord. The following examples illustrate cases where "magic" currently fails to work correctly, has negative consequences or will be impossible to implement properly at all.

Currently, the "magic" simply does not work in many cases were it should:

Example 1. Relations loaded through with() aren't "magically" synced.

$project = Project::find()->where(['id' => 123])->with(['manager'])->one();
$project->manager; // Use relation.
$project->manager_id = $newManagerId; // Change the value of ID to a different one.
var_dump($project->manager->id == $project->manager_id); // false - surprise, surprise - the "magic" no longer works.

Example 2. Relations set manually through populateRelation() aren't "magically" synced.

$project = Project::findOne(123);
$manager = Manager::findOne(456);
$project->manager_id = $manager->id;
$project->populateRelation('manager', $manager); // Set and use relation.
$project->manager_id = $newManagerId; // Change the value of ID to a different one.
var_dump($project->manager->id == $project->manager_id); // false - surprise, surprise - the "magic" no longer works.

or "works" where it shouldn't (by unsetting synchronized relations):

Example 3. If the foreign key value remains the same but changes its type from int to string (for ex., submitted form data comes as string) the "magic" considers relation to be de-synced (leading to extra query into DB to fetch the same data).

$project = Project::findOne(123); // manager_id value comes from DB as int.
$project->manager; // Load and use relation.
$project->manager_id = (string)$project->manager_id; // Convert to string with the same value (happens, for ex., when loading submitted form data).
var_dump($project->isRelationPopulated('manager')); // false - the relation is unset. If relation is used again the same data will have to be fetched from DB with extra query.

Example 4. On ActiveRecord::refresh() all relations are unset regardless of synchronization state (see #19785).

or has other issues due to poor implementation quality:

Example 5. In some cases the housekeeping data of the "magic" lingers even when relation is no longer loaded.

// Lets say we have a primary model and the related model with composite FK:
//  hasOne(RelatedModel::class, ['related_model.a' => 'primary_model.a', 'related_model.b' => 'primary_model.b']).
$primaryModel = PrimaryModel::findOne(123);

$primaryModel->relatedModel; // Incurring a "housekeeping" memory cost here (in _relationsDependencies).
unset($primaryModel->relatedModel); // OK - the relation is gone and housekeeping data is gone too.

$primaryModel->relatedModel;
$primaryModel->b = null; // NOT OK - the relation is gone but a part of housekeeping data remained (in _relationsDependencies).

Upon deeper consideration it becomes obvious that the concept of "forced synchronization" is flawed and unachievable. It is because some desynchronized states are perfectly fine and unavoidable.

Example 6. The enforcement of the rule (unsetting relation when foreign key value changes) upon which the "magic" implementation is based interferes with user intentions.

Setting the relation manually (which actually is often very useful) in general case involves two separate steps:

// DISCLAIMER: This example assumes a potential bug-free implementation of "magic" from scratch was attempted!
$project = Project::findOne(123);
$project->manager; // Load and use relation.
// Now lets manually change the manager to a different one ...
$newManager = Manager::findOne(456);
$project->populateRelation('manager', $newManager); // ... by manually setting relation first ...
$project->manager_id = $newManager->id; // ... followed by manually setting the value of foreign key.
// So what should happen here? The value of manager_id has changed so "magic" should go and frigging nuke perfectly valid data in $project->manager the user just assigned a moment ago ??? Come on ...  
var_dump($project->isRelationPopulated('manager')); // false - the relation is unset - what the hell ???

Example 7. User prepares new data of primary model and its relations for insertion into the database.

Another case where "desynchronized" state is very inherent and is OK.

// DISCLAIMER: This example assumes a potential bug-free implementation of "magic" from scratch was attempted!
$project = new Project();
$project->populateRelation('jobs', []);
for($i=0; $i<3; $i++)
    $project->jobs[] = new Job();
$project->load(Yii::$app->getRequest()->post());
Job::loadMultiple($project->jobs, Yii::$app->getRequest()->post());
if($project->validate() && Model::validateMultiple($project->jobs))
{
    // Up until now the relations can be considered synchronized, i.e. $project->jobs[0]->project_id == null and $project->id == null.
    $project->insert(false);
    // Notice how related jobs are inherently desynchronized after the call to `insert()` because $project just now acquired its primary key value.
    // I.e. $project->jobs[0]->project_id is null but $project->id is no longer null.

    $project->refresh(); // Lets fetch CURRENT_TIMESTAMP values first and afterwards deal with inserting jobs.

    // So what should happen here? The call to refresh() (or maybe the call to insert()?) is supposed to unset all currently desynchronized relations, yes?
    // So lets go and frigging nuke perfectly valid data in $project->jobs the user just carefully prepared a moment ago ??? Come on ...  
    var_dump($project->isRelationPopulated('jobs')); // false - the relation is unset - what the hell ???

    // Also, notice how THIS time, the nuked relation data cannot be silently lazy-loaded from DB again, like in other examples, because it never even made it into the database in the first place! 
}

Example 8. The synchronization the "magic" is currently tries to achieve is very one-sided.

$project = Project::findOne(123);
$project->manager; // Load and use relation.

$project->manager_id = $newManagerId; // Change the value of ID to a different one.
var_dump($project->manager->id == $project->manager_id); // true - synchronization is "magically" enforced!

$project->manager->id = $anotherManagerId;  // The other side.
var_dump($project->manager->id == $project->manager_id); // false - ??? Desynchronization here is suddenly OK ??? 

Example 9. How should synchronization look when used together with other existing features of ActiveRecord - for ex. markAttributeDirty()?

$project->markAttributeDirty('manager_id');
// Now what? The attribute value didn't really change, should we unset the relation or not?

Example 10. How should synchronization look when used together with other existing features of ActiveRecord - for ex. fetching a PARTIAL set of attributes from DB (when COMPOSITE foreign key is fetches partially)?

I am sure there are many more examples like this and this is just a tip of the iceberg.

The price.

Another side of the story that the examples above do not show is the PRICE paid to achieve the "magic" (obvious from implementation code):

  1. The "magic" increases memory usage of ActiveRecord instances that have relations loaded (by adding yet another array _relationDependencies). Anyone who tried to handle a few thousand ARs would NOT be welcoming another memory footprint increase of AR that could have been easily avoided.
  2. The "magic" comes at additional performance cost present even when NO relations are used at all! For ex., when doing something as SIMPLE as assigning a value to ActiveRecord attribute (like $project->name = 'New name';).
  3. The "magic" further increases complexity of already complex, hard to understand and support machinery of ActiveRecord and related classes.

Summary and solution.

So, lets see what we currently have:

  1. The "magic" is not even documented anywhere other than a mention in CHANGELOG.md and UPGRADE.md.
  2. The "magic" is not working at all as it was supposed to in a lot of cases (see examples above). This, along with absent documentation, leads to a situation where it is completely unclear what engine behaviour Yii2 users are supposed to rely upon. The "relations are automatically synced" guarantee the "magic" was supposed to give does not hold.
  3. In cases where the "magic" does work it can lead to the same data being fetched from DB again, or, at worst, to losing the data before it was inserted into DB (see examples above).
  4. The "magic" comes at very undesirable performance and memory costs (see "The price" above).
  5. There are situations where the "magic" becomes unachievable or interferes (in a bad way) with other perfectly valid uses of ActiveRecord (see the last four examples above).

Now, lets take a look on a situation before enforcing of synchronization was attempted. Was it really so bad?

# THE SOLUTION. Making "magic" work without "magic".
$project = Project::findOne(123);
$project->manager; // Load and use relation.
$project->manager_id = $newManagerId; // Change the value of ID to a different one.
unset($project->manager); // <-- The TRUE "magic".
var_dump($project->manager->id == $project->manager_id); // true - the whole "magic" is achieved in single line with ZERO performance/memory costs !!!

The WHOLE "magic" above could have been easily achieved by something as simple as a single line of unset() in a user code when user expects ActiveRecord's foreign key attributes to change !!!

The conclusion.

The only sensible solution is giving up on an attempt to enforce automatic synchronization between relation and corresponding foreign key attributes. This means that the only guarantee the engine should provide is: once set ActiveRecord relations are never automatically or silently changed/unset by the engine. Which means it is user responsibility to ensure his data stays synchronized to whatever extent user deems necessary and to explicitly call unset() on relations that are expected to change and need to be reloaded.

mtangoo commented 1 year ago

@rob006 @samdark can you provide some insights on this issue. Have not got time to go indepth in the code (I will find time to do that) but reading OP's explanation and example it seems like the best way is to remove the behavior and document how to achieve it. It seems to me (from the explanations) that it is hard to get "the right solution" and in this case my opinion would be offload that from Yii and tell users how to achieve it, if they want to do it.

What are you guys thinking, since you were involved in feature in discussion?

@yiisoft/reviewers Any thoughts on this?

rob006 commented 1 year ago

I think that most critics can be summarized "in some cases it does not work", and removing this feature would change it to "it never works", which is not an improvement. Also I think that some "holes", like ignoring this for eager loading or populateRelation(), are intentional, so user could skip this behavior when he wants more control over relations.

Behavior of ActiveRecord::refresh() was not affected by #13618 in terms of #19785 (resetting loaded relations in refresh() was a thing since the beginning of Yii 2.0). In fact, mechanism introduced in #13618 is the only way to fix #19785 in somewhat reliable way - refresh() can change FK value, so keeping old relations may put model in inconsistent state, when $model->manager_id == 1 && $model->manager->id == 2 returns true, thus relations also needs to be refreshed. We could avoid additional query if we compare value of FK and load relation only if FK has changed after refresh(). It is still debatable whether it is intended behavior - related models can be also outdated, so it could backfire in some cases.

As for performance implications - I would rather se some benchmarks on real examples, as overhead really looks negligible.

PowerGamer1 commented 1 year ago

I think that most critics can be summarized "in some cases it does not work",

More like "in MOST cases it does not work". Making design contradictory and confusing to say the least. Can you formulate what guarantee engine currently provides with regard of unsetting or not unsetting relations?

and removing this feature would change it to "it never works", which is not an improvement.

It is a SIGNIFICANT improvement, because the design becomes CONSISTENT, CLEAR and SIMPLE: "Engine guarantees: once set ActiveRecord relations are never automatically or silently changed/unset by the engine". That gives user full control and ability to solve all cases I mentioned in my original post.

Also I think that some "holes", like ignoring this for eager loading or populateRelation(), are intentional, so user could skip this behavior when he wants more control over relations.

I don't think this was intentional. I think it was a usual case when someone implemented feature for his own isolated use-case (which is lazy loading) without regards of how other parts of the engine he never used will be affected (or how overall design of the engine will be affected). Here is a recent example of the same shortsightedness: #13567 - a change to with() practically accepted without even thinking about how it will affect other very closely related parts of the engine like joinWith().

Besides, as I illustrated by my examples, these "holes" are insufficient to have full control in all cases. The only way to have such control is to make user responsible for the data synchronization.

Behavior of ActiveRecord::refresh() was not affected by #13618 in terms of #19785 (resetting loaded relations in refresh() was a thing since the beginning of Yii 2.0). In fact, mechanism introduced in #13618 is the only way to fix #19785 in somewhat reliable way - refresh() can change FK value, so keeping old relations may put model in inconsistent state, when $model->manager_id == 1 && $model->manager->id == 2 returns true, thus relations also needs to be refreshed. We could avoid additional query if we compare value of FK and load relation only if FK has changed after refresh(). It is still debatable whether it is intended behavior - related models can be also outdated, so it could backfire in some cases.

You talk about fixing #19785 as if that is the ONLY current place where engine fails to enforce synchronization, what about the rest of them (which are plenty)?

Also, if you read my original post carefully, you may have noticed that inconsistent state $model->manager_id == 1 && $model->manager->id == 2 CANNOT (and shouldn't) be avoided in all situations, so trying to avoid it in some isolated scenarios (like in refresh()) makes no sense (because, again, the engine design becomes inconsistent, confusing and unreliable).

As for performance implications - I would rather se some benchmarks on real examples, as overhead really looks negligible.

Here is a simple example on memory usage: ActiveRecord::_oldAttributes DOUBLES memory usage of each row of data fetched from database. Impact of ActiveRecord::_relationDependencies may be less or more depending on how many attributes the model has and how many relations the ActiveRecord model was loaded with (and don't forget that in general case each relation may be represented by COMPOSITE keys which increases memory usage of _relationDependencies fast).

As for the performance it may only be negligable in comparison with everything else ActiveRecord does (like the amount of other processing inside __get and __set), but then all the more reasons to start improving the performance.

rob006 commented 1 year ago

Can you formulate what guarantee engine currently provides with regard of unsetting or not unsetting relations?

I think that general rule of thumb is: if you rely on magic lazy loading, magic reload of these magically loaded relations also comes into play. If you explicitly configure loading relation either by eager loading or by using populateRelation(), it is also your responsibility to reload relations and "magic" does not work in these scenarios.

Also, if you read my original post carefully, you may have noticed that inconsistent state $model->manager_id == 1 && $model->manager->id == 2 CANNOT (and shouldn't) be avoided in all situations, so trying to avoid it in some isolated scenarios (like in refresh()) makes no sense (because, again, the engine design becomes inconsistent, confusing and unreliable).

I'm not sure which case you have in mind (you have a lot of examples in first post, some of them contradicts each other), but if your workflow relies on model being in inconsistent state, this is more of a problem with workflow than with AR.

Here is a simple example on memory usage: ActiveRecord::_oldAttributes DOUBLES memory usage of each row of data fetched from database.

AFAIK it is not really true - PHP uses COW for arrays, so memory consumption increases only with actual changes to model. _relationDependencies is also set only when relation is loaded, and it should be way lighter than actual relation. If you really want to be such nitpicky about performance, you probably should not use AR in the first place - by design it has a lot of overhead and _relationDependencies is probably one of the last things you should worry about.

You talk about fixing https://github.com/yiisoft/yii2/issues/19785

To be fair, I don't think that #19785 needs to be fixed. If you asked model to refresh, you have refreshed model. Relations are part of model, so they're refreshed too. Looks like perfectly valid behavior for me.

mtangoo commented 1 year ago

Thanks @rob006 and @PowerGamer1 for the civil and constructive discussion. As far as I see, we need to pick one of the options (Remove or stay with the status quo) and document it. So far documenting the current behavior, where magic work and where magic is not working and user must explicitly do it himself seems to be enough to remove uncertanities.

Any thoughts on that? @rob006 @PowerGamer1

rob006 commented 1 year ago

There are still things that could be improved in current solution:

  1. We could make setRelationDependencies() protected instead of private, so users could disable or control this behavior in their own models.
  2. While I think that current behavior of refresh() is correct, refreshing only attributes looks like a valid use case, so we could add another method for that (refreshAttributes()?).

I think we also need to verify some of the claims in this issue before taking any action. Based on description of this issue I assumed that refreshing relations does not work for populateRelation() and with(), but after quick look into code, I'm not sure why it would not work. So this could be a fixable bug or just false claim and there are no "holes".

PowerGamer1 commented 1 year ago

We could avoid additional query if we compare value of FK and load relation only if FK has changed after refresh(). It is still debatable whether it is intended behavior - related models can be also outdated, so it could backfire in some cases.

By the way, here you actually CONFIRMED the conclusion I came to - only the user HIMSELF can decide when to unset relations and which exactly.

I think that general rule of thumb is: if you rely on magic lazy loading, magic reload of these magically loaded relations also comes into play. If you explicitly configure loading relation either by eager loading or by using populateRelation(), it is also your responsibility to reload relations and "magic" does not work in these scenarios.

So just because user decided to use a more efficient eager loading (remember the "N+1" problem, for ex.) that makes him somehow not WORTHY of your "magic" behaviour?

Also, have you tried to NOT rely on lazy loading in Yii2? Unlike, for example, Eloquent ORM, which has option to DISABLE lazy loading completely (and error out if relation is not loaded) there is no such option in Yii2 and so avoiding lazy loading becomes a practically impossible nightmare. Anyway, in practice it is always a mix of the two (eager and lazy loading) and they should not behave so much different on a conceptual level.

I'm not sure which case you have in mind (you have a lot of examples in first post, some of them contradicts each other),

It's not the EXAMPLES that contradict each other - it's the ENGINE BEHAVIOUR in them.

but if your workflow relies on model being in inconsistent state, this is more of a problem with workflow than with AR.

No workflow relies on inconsistent state, the state may become inconsistent during transition from one consistent state to the next consistent state (study example 7 again). Due to that attempts by the engine to ALWAYS keep the state consistent are doomed to fail.

AFAIK it is not really true - PHP uses COW for arrays, so memory consumption increases only with actual changes to model.

You might be right here.

If you really want to be such nitpicky about performance, you probably should not use AR in the first place

Yeah, except there is no light and fast alternative to AR in the engine or do you suggest that people should go back to the stone age and use arrays instead of object classes? Regardless, performance is just an icing on the cake and NOT the main reason for the change (the poor implementation quality and inconsistency in the engine are).

As far as I see, we need to pick one of the options (Remove or stay with the status quo) and document it. So far documenting the current behavior, where magic work and where magic is not working and user must explicitly do it himself seems to be enough to remove uncertanities.

"Staying with the status quo" simply won't do because:

P.S. Can I please hear the opinions of other maintainers (esp. @samdark) other than rob006? Because I can already see that rob006 cares not about engine consistency - only about survival of his specific isolated use-case.

P.P.S. You may start with the statement if you agree that both lazy AND eager loading should work CONSISTENTLY with regard of "magic" synching. Because I don't want to waste tons of my time spelling obvious things for you such as:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!! THE BASIS FOR ANY FURTHER DISCUSSION !!!!!!!!!

As rob006 considers to be acceptable for "magic" to work on lazy loading ONLY. Imagine user had lazy loading in place and relied on "magic" synching and then for whatever VALID reason decides to switch to eager loading (such as realizing he has "N+1" problem and deciding to make his code more efficient; or ensuring specific relations are loaded inside the transaction rather than lazily outside) - and BAM - his code breaks because "magic" synching suddenly stops working for him ??? How can you even consider something like THAT to be acceptable ???

!!!!!!!!!! THE BASIS FOR ANY FURTHER DISCUSSION !!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

rob006 commented 1 year ago

It's not the EXAMPLES that contradict each other - it's the ENGINE BEHAVIOUR in them.

In example 2 you complain that relations loaded with populateRelation() are not refreshed. In example 6 you complain that relations loaded with populateRelation() are refreshed. That looks like mutually exclusive statements to me.

At that point I would rather see a unit tests that would confirm the "holes" in implementation. And then, if it is possible, we could focus on fixing them and making behavior consistent.

PowerGamer1 commented 1 year ago

It's not the EXAMPLES that contradict each other - it's the ENGINE BEHAVIOUR in them.

In example 2 you complain that relations loaded with populateRelation() are not refreshed. In example 6 you complain that relations loaded with populateRelation() are refreshed. That looks like mutually exclusive statements to me.

In example 2 I complain that "magic" behaviour is inconsistent (lazy loading vs populateRelation()). In example 6 I show what would happen IF someone attempts to implement "magic" synching CONSISTENTLY (in lazy loading, in eager loading and in populateRelation()) using the same naive method of unsetting relations. There I complain about negative consequences of such unsetting.

So, as you can see, no contradiction in my examples at all.

At that point I would rather see a unit tests that would confirm the "holes" in implementation. And then, if it is possible, we could focus on fixing them and making behavior consistent.

You may start with the fact that "magic" synching is not working for eager loading and populateRelation() at all. Unless you still consider the situation outlined in "THE BASIS FOR ANY FURTHER DISCUSSION" acceptable - if so then there is no point in discussing anything else with you (or anyone else who thinks so) at all.

mtangoo commented 1 year ago

currently you can't workaround in user code against "magic" unsetting of relations in all cases; the current implementation quality is very poor (example 5); the overall engine design becomes completely inconsistent and laughable (see P.P.S.).

If is not heavy on you, can you propose a PR with what you see as perfect/better implementation?

PowerGamer1 commented 1 year ago

If is not heavy on you, can you propose a PR with what you see as perfect/better implementation?

Such implementation is simple (see #19918): the rollback of #13618 with additional removal of relation unsetting in refresh(). The result: consistent behaviour and clear guarantee is achieved: the engine NEVER attempts to provide magic synchronization and NEVER automatically unsets loaded relations.

Yes, it will be a BC break, but you can think of it as undoing the BC break done by #13618 in the past. There is no other way around it.

As for the part with refresh() I am open to changes like adding additional boolean argument to it to unset ALL loaded relations if user chooses to do so. It might even make sense to add a separate method to AR to unset ALL relations at once when needed instead of manually listing them in built-in unset().