phalcon / cphalcon

High performance, full-stack PHP framework delivered as a C extension.
https://phalcon.io
BSD 3-Clause "New" or "Revised" License
10.79k stars 1.96k forks source link

[BUG]: Save, saves related data also (not changed) #16343

Closed niden closed 1 year ago

niden commented 1 year ago

I have interesting situation.... @niden can u check out that? A help!

        $contractor = \Models\Contractors::findFirst();
        $company = $contractor->getCompany();

        $contractor->save();

In that case, save will be executed in relation too. I mean object $contractor will be save and object $company too. Why is that?

Originally posted by @borisdelev in https://github.com/phalcon/cphalcon/discussions/16336#discussioncomment-5993602

borisdelev commented 1 year ago

More details here and description... lets say we have 2 Models: Companies & Contractors. Contractors connected Companies with belongsTo relation (alias Company).

Somewhere in controller action:

$contractor = \Models\Contractors::findFirst();
$company = $contractor->getCompany(); // Get related company

...

$contractor->save();

... this will invoke ->save method on $contractor but and ->save method in company (i added dump in beforeValidation method on Company model). Expected behaviour is to save only $contractor.

PhalconPHP 5.2.*

Not sure is it a feature or a bug... anyway. I want to help somehow.

andrewrbe commented 1 year ago

This is a bug. It appeared in 4.1 branch. To be precise now we are running 4.1.2. There is no such behavior in php7.4-phalcon_4.0.4-908. We even fixed this package in our Docker but now we had to update and got a lot issues.

Lets assume we have a code

class Spendings extends BaseModel {
....
public function initialize(): void {
    $this->hasOne('spendings_categories_id',
            SpendingsCategories::class,
            'id',
            [
                'reusable' => true,
                'alias'    => 'Category',
            ]);
$sp = Spendings::findFirst(627979);
echo  $sp->Category->name; //Relation
$sp->amount += 0.01;
sp->save();

Queries will be:

       START TRANSACTION
      UPDATE `spendings` SET `status` = 1, `.... WHERE `id` = 627979
(!!!) UPDATE `spendings_categories` SET `name` = 'test', `.... WHERE `id` = 12601
       COMMIT

On the one hand let it be, just additional update. But if we changed spendings_categories_id in spendings to another the second update will wrongly update a row in spendings_categories. For example you had spendings_categories: id | name 1 | category 1 2 | category 2 You change spendings.spending_category_id from 1 to 2, now you have 1 | category 2 2 | category 2

There is workaround with switches off relation update

public function save(): bool {
        $this->related = [];
        return parent::save();
    }

Do not hesitate to contact me if you need more details or tests.

rudiservo commented 1 year ago

TLDR: this happens in collectRelatedToSave()

Starts in the Model::__get

if typeof relation == "object" {
            /**
             * There might be unsaved related records that can be returned
             */
            if isset this->dirtyRelated[lowerProperty] {
                return this->dirtyRelated[lowerProperty];
            }

            /**
             * Get the related records
             */
            return this->getRelated(lowerProperty);
        }

if you fetch anything it does not get set in the dirtyRelated, I think the ideas was to only execute presaverelated() and postsaverelated() if hasRelated() and collectRelatedToSave()

Issue is in the collectRelatedToSave()

for name, record in related {
            if isset dirtyRelated[name] {
                continue;
            }

            if typeof record !== "object" || !(record instanceof ModelInterface) {
                continue;
            }

            record->setDirtyState(self::DIRTY_STATE_TRANSIENT);
            let dirtyRelated[name] = record;
        }

so the getRelated() will set this->related[lower_property] = related and the collectRelatedToSave() will iterate on the related But since the model is not in the this->dirtyRelated it will be set Transient with record->setDirtyState(self::DIRTY_STATE_TRANSIENT); So it can be executed in preSaveRelatedRecord with if record->dirtyState !== Model::DIRTY_STATE_PERSISTENT && !record->save()

Right now it's a feature because probably there "is" a bug So lets suppose you have C belongs to B and B belongs to A

if we A->B->C->setSomething(value); C will be in a transient state, but B will not So saving A will not save C unless B is also in a transient State

I can think of 3 ways to fix this:

  1. has a static "cache" of real dirtyModels to be executed, this could trigger a save on unwanted and unrelated models.
  2. cascade dirtyState to parents, this will consume a lot of cycles since you have to check if there is a loaded relation and cascade on it everytime you do a __set
  3. check all loaded relations recursively for any transient models on save, this could trigger infinite loops (my guess is it does already) so an EXECUTED and EXECUTED_TRANSIENT states will/might be needed to differentiate and avoid infinite save loops, this might be possible without a First Level Cache, I am not sure.

FYI to trigger an infinite loop ATM you just have to

A->B
B->setA(A)
A->save()

I am working currently on this in my repo while trying to add FirstLevelCache... not easy without breaking a few things, sorry.

rudiservo commented 1 year ago

Forgot to mention, my proposal for extra Executed States if not optional they might be a breaking change on model before and prepare Events with business logic. for example if someone as a beforesave or preparesave that B changes A, if A is already executed and B makes a change in A before update, that change will not be saved and might only happen on hasOne and hasMany relations.

noone-silent commented 1 year ago

This is weird.

We use Phalcon 4.1.3 and have this bug too. But in 4.1.3 in the collectRelatedToSave() doesn't contain the mentioned setDirtyState() call. Anyway, we tried to overwrite the collectRelatedToSave() with this:


    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            // We mimic the same behavior like without snapshots.
            $changed = true;
            if ($record instanceof Model === true) {
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

It completely resolved the issue! We use snapshots, else hasChanged() wouldn't work. So there is definitely something broken.

rudiservo commented 1 year ago

This is weird.

We use Phalcon 4.1.3 and have this bug too. But in 4.1.3 in the collectRelatedToSave() doesn't contain the mentioned setDirtyState() call. Anyway, we tried to overwrite the collectRelatedToSave() with this:

    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            // We mimic the same behavior like without snapshots.
            $changed = true;
            if ($record instanceof Model === true) {
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

It completely resolved the issue! We use snapshots, else hasChanged() wouldn't work. So there is definitely something broken.

It's a quick fix if you are only trying to save the "first ring" or first level of relations, if you go beyond that it is an issue.

If you have a second level or a second ring, A->B->C and changed only C and trigger A->save();

All ok here A checks for B hasChanged, nothing is changed on B so move on.

Problem is we only changed C and it will never be saved unless you call C->save somewhere in your code. If you have a really heavy business logic in the models, this is going to cause issues with models not being saved.

It's a quick fix for simple level 1 relations, anything above that is broken.

noone-silent commented 1 year ago

This is weird. We use Phalcon 4.1.3 and have this bug too. But in 4.1.3 in the collectRelatedToSave() doesn't contain the mentioned setDirtyState() call. Anyway, we tried to overwrite the collectRelatedToSave() with this:

    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            // We mimic the same behavior like without snapshots.
            $changed = true;
            if ($record instanceof Model === true) {
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

It completely resolved the issue! We use snapshots, else hasChanged() wouldn't work. So there is definitely something broken.

It's a quick fix if you are only trying to save the "first ring" or first level of relations, if you go beyond that it is an issue.

If you have a second level or a second ring, A->B->C and changed only C and trigger A->save();

All ok here A checks for B hasChanged, nothing is changed on B so move on.

Problem is we only changed C and it will never be saved unless you call C->save somewhere in your code. If you have a really heavy business logic in the models, this is going to cause issues with models not being saved.

It's a quick fix for simple level 1 relations, anything above that is broken.

We never used C level, because of the way getRelated() handles not found or null/false results. We also still use php 7.4 so we do not have ?->.

The solution to your case, would be, to call the same method on every $record:

    protected function collectRelatedToSave(): array
    {
        $related = $this->related;
        $dirtyRelated = $this->dirtyRelated;
        foreach ($related as $name => $record) {
            if (isset($dirtyRelated[$name]) === true) {
                continue;
            }

            if ($record instanceof ModelInterface === false) {
                continue;
            }

            $changed = true;
            if ($record instanceof Model === true) {
                $hasDirtyRelated = $record->collectRelatedToSave();
                if (\count($hasDirtyRelated) > 0) {
                    $record->save();
                }
                $changed = $record->hasChanged();
            }

            if ($changed === true || $record->getDirtyState() !== self::DIRTY_STATE_PERSISTENT) {
                $dirtyRelated[$name] = $record;
            }
        }

        return $dirtyRelated;
    }

That should solve C, D and WhatEverLevel.

rudiservo commented 1 year ago

@noone-silent no you can't do that! you have prepare, before and after that must me run on it's order of type of relation, that will trigger prepareSave and beforeSave no matter what type of relation there is, it is a breaking change, massive one.

rudiservo commented 1 year ago

@noone-silent the only way to fix it correctly is with state flags and first level cache (thread cache). Because you need to avoid save loops, run the whole relations to check for changed data and prepareSave might change data, and FLC to keep a single source of truth where if you find a model it will always return the same object instance.

This is something that Java Hibernate ORM has by default and you can't disable it.

https://vladmihalcea.com/jpa-hibernate-first-level-cache/

https://vladmihalcea.com/the-anatomy-of-hibernate-dirty-checking/

I wish it was an easier problem to solve, but it's not, all magic comes with a price, ORM magic comes with a heavy price of data flow complexity, consistency complexity and a lot of cycles to ensure that everything checks out.

rudiservo commented 1 year ago

To be honest I think prepareSave, while it has good intentions of enabling making a lot of changes just before the model gets saved, it is a bad practice if you rely on it for data consistently. Because the ORM can only run the model once, if devs depend on prepareSave for calculations or data consistency and are not using their own logic and functions before triggering a save(), it will lead to data not being changed because of the type and complexity of relations.

I have a very complex project that relations can go to 5 or 6 layers deep, don't like it but it's the way the customer needs it.

We could however have a dirty cache inside a thread cache that when a save gets triggered it runs normally then checks the cache for unsaved models, this adds more cycles and checks every time a model gets changed or save, and might even be unwanted behaviour, I don't recommend it like this.

So my proposal is dirty_state execution flags to avoid loops, resultset with added logic to append/add and persist models with a clear way for dev to program to make sure that it's get triggered by save(), and avoid magic has much has possible.

I am working on it has we speak, it's slow progress because I am trying not to break previous behaviour, i.e. saving hasMany that is an array instead of a resultset object, keeping collectRelatedToSave function. The hardest part of them all is when you delete or remove a related model, this is the one thing that will drive me nuts, because the model still exists in the database, but you do not want it to be used in a resultset because you deleted it, just not persisted it.

larsverp commented 1 year ago

@rudiservo Is this related to this comment: https://github.com/phalcon/cphalcon/issues/15148#issuecomment-1415894473

Because we are having massive performance issues in our application since the Phalcon 5 update :)

noone-silent commented 1 year ago

To be honest I think prepareSave, while it has good intentions of enabling making a lot of changes just before the model gets saved, it is a bad practice if you rely on it for data consistently. Because the ORM can only run the model once, if devs depend on prepareSave for calculations or data consistency and are not using their own logic and functions before triggering a save(), it will lead to data not being changed because of the type and complexity of relations.

I have a very complex project that relations can go to 5 or 6 layers deep, don't like it but it's the way the customer needs it.

We could however have a dirty cache inside a thread cache that when a save gets triggered it runs normally then checks the cache for unsaved models, this adds more cycles and checks every time a model gets changed or save, and might even be unwanted behaviour, I don't recommend it like this.

So my proposal is dirty_state execution flags to avoid loops, resultset with added logic to append/add and persist models with a clear way for dev to program to make sure that it's get triggered by save(), and avoid magic has much has possible.

I am working on it has we speak, it's slow progress because I am trying not to break previous behaviour, i.e. saving hasMany that is an array instead of a resultset object, keeping collectRelatedToSave function. The hardest part of them all is when you delete or remove a related model, this is the one thing that will drive me nuts, because the model still exists in the database, but you do not want it to be used in a resultset because you deleted it, just not persisted it.

If you find a solution, please patch it into phalcon 4 too. People still use it

larsverp commented 1 year ago

I have also replicated this issue in our application. We have installed ClockWork that allows us to see all database calls made in a call. I ran this code:

public function phalconAction()
    {
        /** @var EventTicket $ticket */
        $ticket = EventTicket::query()->andWhere('uuid = "93b006a8-188b-4c2b-bd3c-64f8c6f3101e"')->execute()->getFirst();

        $orderItem   = $ticket->orderItem;
        $orderStatus = $orderItem->order->status;

        if ($orderStatus === OrderStatus::COMPLETE) {
            $ticket->status = EventTicketStatus::RESERVED;
            $ticket->save();
        }
    }

That results in the following ClockWork result:

image

The EventTicket Query is not on the screenshot, but you can see that 2 UPDATE queries are done for Order & OrderItem. The data in the UPDATE query is not different from what is currently in the database.

The example code I ran is not too bad, but in our production application this creates a massive performance issue.

rudiservo commented 1 year ago

@rudiservo Is this related to this comment: #15148 (comment)

Because we are having massive performance issues in our application since the Phalcon 5 update :)

At first glance yes, do you have dynamic update? Just to know if it influences it.

larsverp commented 1 year ago

@rudiservo Pleas explain what you mean with dynamic updates? The code example I provided in my previous comment shows what the issue is :)

rudiservo commented 1 year ago

@rudiservo Pleas explain what you mean with dynamic updates? The code example I provided in my previous comment shows what the issue is :)

It checks the model for changed fields before making and update to the database. It uses more RAM because it keeps a snapshot of original values, and a few more CPU cycles to check for changes.

The increased load you are seeing is in the PHP or the database?

public function initialize()
    {
        $this->useDynamicUpdate(true);
    }
larsverp commented 1 year ago

@rudiservo Pleas explain what you mean with dynamic updates? The code example I provided in my previous comment shows what the issue is :)

It checks the model for changed fields before making and update to the database. It uses more RAM because it keeps a snapshot of original values, and a few more CPU cycles to check for changes.

The increased load you are seeing is in the PHP or the database?

public function initialize()
    {
        $this->useDynamicUpdate(true);
    }

Our database is the biggest issue. But that's understandable looking at my example. (https://github.com/phalcon/cphalcon/issues/16343#issuecomment-1633680665)

Every model select triggers a database update as well. That explains why our database has been having a hard time since the Phalcon 5 update.

rudiservo commented 1 year ago

@larsverp well dynamic updates will save writes to your DB, also they will not issue an update for all the fields of the model, only the ones that have changed, making the update query smaller. It's easier to scale up php workers than it is to scale up the DB, with Relational DB clusters the writing usually happens always in the same node, so pulling easing off the writing will improve the database by a lot.

Test it and give your feedback on how it went.

The solution for this without using dynamic updates, is to forget about "dirty state" on relations has you have to always check the relations for changes and just use the dirty state only for changed fields.

larsverp commented 1 year ago

@rudiservo Alright thanks I will do some testing with the Dynamic update. Just to be on the same line, the issue we're trying to solve is about Phalcon marking a model as dirty while no fields are updated , right? Is there a possibility for us to disable Phalcon setting a Dirty state on a model? I'd rather check it myself while it's not working.

larsverp commented 1 year ago

@rudiservo We already seem to use DynamicUpdates so this doesn't do anything for us. It still tries to update the entire model in the DB even no fields have been changed.

larsverp commented 1 year ago

@rudiservo I think I was able to write a workaround for our application.

I found the following code in the CPhalcon repo:

protected function collectRelatedToSave() -> array
    {
        var name, record;
        array related, dirtyRelated;

        /**
         * Load previously queried related records
         */
        let related = this->related;

        /**
         * Load unsaved related records
         */
        let dirtyRelated = this->dirtyRelated;

        for name, record in related {
            if isset dirtyRelated[name] {
                continue;
            }

            if typeof record !== "object" || !(record instanceof ModelInterface) {
                continue;
            }

            record->setDirtyState(self::DIRTY_STATE_TRANSIENT);
            let dirtyRelated[name] = record;
        }

        return dirtyRelated;
    }

Source: https://github.com/phalcon/cphalcon/blob/5.0.x/phalcon/Mvc/Model.zep#L1126

In our application all models extend from the ModelBase class (Which extends from the Phalcon Model class) in the ModelBase class I overwrote the collectRelatedToSave function as follows:

public function collectRelatedToSave(): array
{
    $related      = $this->related;
    $dirtyRelated = $this->dirtyRelated;

    foreach ($related as $name => $record) {
        if (isset($dirtyRelated[$name])) {
            continue;
        }

        if (!is_object($record) || !($record instanceof ModelInterface)) {
            continue;
        }

        if (empty($record->getChangedFields()) && empty($record->collectRelatedToSave())) {//<------ This is new
            continue;
        }

        $record->setDirtyState(1);
        $dirtyRelated[$name] = $record;
    }

    return $dirtyRelated;
}

What's your opinion? It seems to work for our application as far as I've tested it.

rudiservo commented 1 year ago

@larsverp it's a quick fix, probably not ideal, it may have a big performance impact, also needs to pass the tests.

rudiservo commented 1 year ago

@larsverp Will only work if Dynamic update is set, so you have to put it in the doLowUpdate function. Also there is a bit of an issue, with field types. In do lowUpdate you have to cast the values to compare in order to assert if the values are what you want.

i.e. "4" and "4.0" are not exactly the same for PHP, a lot of dev's do not cast the value on set, so this needs to be done.

So the getChangedFields, might not get all changed fields, the function needs a revamp and doLowUpdate needs a revamp to.

The weird part is where you say you have dynamicUpdates, and still saves the model, it should not unless you have an issue in your snapshots, or the type of data you have in your database and model.

Check this to see if there is something different about your model so we can do something about it. https://github.com/phalcon/cphalcon/blob/5.0.x/phalcon/Mvc/Model.zep#L3911

rudiservo commented 1 year ago

@borisdelev do you have dynamicUpdate set? because if you do and it is executing an update, that behaviour should not happen. Can you debug and tell us what are types of the fields getting updated, or is it all of them?

borisdelev commented 1 year ago

Nope, didnt set dynamic update. Its a little project. In my case all fields are updated... And is merged/saved with relations somehow.

larsverp commented 1 year ago

@rudiservo Thanks for the comment. I'll do some research to find out if this is the case for us. :)

noone-silent commented 1 year ago

@borisdelev do you have dynamicUpdate set? because if you do and it is executing an update, that behaviour should not happen. Can you debug and tell us what are types of the fields getting updated, or is it all of them?

We do have dynamicUpdate set and use snapshots. But this bug is saving the related model completely and totally ignores everything.

rudiservo commented 1 year ago

ok, so two different situations @borisdelev asked about the behaviour of saving and calling save on every relation.

@noone-silent and @larsverp have the dynamic update but the model get saved and all fields get saved on that update.

Is this right?

noone-silent commented 1 year ago

It's the same behavior, no matter if dynamicUpdates/Snapshots are activated or not. As soon there is access to a related model, the model is getting saved with all fields, even there was no change on the related model data.

It was just a guess, that if dynamicUpdates are enabled this bug would not trigger. But this bug triggers, regardless what settings have been made.

And again: This bug also appears in Phalcon 4, where there is no dirty state setting. So the real bug is somewhere else I guess

larsverp commented 1 year ago

@rudiservo I did some research to your last comment. This issue for us has nothing to do with dynamicUpdate. This issue is that the Phalcon Framwework is marking models as "dirty" while their fields have not changed.

An yes I guess the dynamicUpdate is also a bit broken then, since it should not store any fields. But my biggest concern is the Dirty state not being set correctly when fetching related models. For example this is now the case in our code base:

$someModel = SomeModel::query->andWhere('id = 1')->execute();

$someRelatedModel = $someModel->someRelatedModel;
$someOtherRelatedModel = $someRelatedModel->someOtherRelatedModel;

if ($someRelatedModel->status = 'VALID' && $someOtherRelatedModel->status = 'VALID') {
    $someModel->name = 'A new name';
    $someModel->save();
}

// What happens (looking at our database logging in combination with Clockwork:
// 1: someModel SELECT
// 2: someRelatedModel SELECT
// 3: someOtherRelatedModel SELECT

// So far so good, but this is where it's getting weird:
// someOtherRelatedModel UPDATE some fields, but no new information
// someRelatedModel UPDATE some fields, but no new information
// someModel UPDATE name field.

The 2 related models should not go to the database. They were only read and never written. I fixed the issue for us by overwriting the collectRelatedToSave function.

Hope that helps to clear up the confusion that's it's about Dynamic Updates. (It's not)

rudiservo commented 1 year ago

OK, so here is the thing, ORM has a lot of "magic" and that means it has to do some tasks regardless of change, so Model::save has to be called on all relations regardless because the ORM does not know how deep it goes and if it does have any changes.

If the ORM is saving the full model with dynamicUpdate on, then that is another issue that needs to be checked, and it can be just a missconfiguration or a bug.

Has for this issue of calling all relations and executing a save, I am sorry to say this but welcome to ORM magic 101, but we can brainstorm a solution. Just remember that all magic comes with a price.

@larsverp your solution wont work because you are only thinking of 1 layer, in your example if you do.

$someOtherRelatedModel->status = 'INVALID';
$someModel->save();

The someOtherRelatedModel will never get saved because someRelatedModel does not have changed fields. the save is working has a "fire and forget", make any changes permanent!

This type of magic is going to cost something, the only thing we can do is make it more efficient and cost less.

Some solutions to keep this "performant" is

  1. to have the Model keep a dirtyList of all the models that where changed
  2. the other one if force dynamicUpdate,
  3. another one I can think of is to have a clearRelations function (I think @noone-silent suggested) but this function is called by the dev before a save.

Caveats

  1. everytime you set a fields the model pushes himself into an array, that means if you are setting a lot of fields for the same model the cost of doing it is greater than checking the relations, also prepareSave and BeforeSave logic from other relations that are needed for business logic might not be called, IMO this might break more stuff then it fixes;
  2. not really a big issue but its going to make the PHP worker heavier and the DB ligher;
  3. you know your code so you know if and when you can call it so it does not break your business logic, low magic and gives control to the dev.

Any other ideas?

noone-silent commented 1 year ago

About the solutions:

  1. We have the Model\Manager for this already, just needs to have one additional array with changed models. But see point 2
  2. I seriously don't know why this isn't activated all the time. Just so we save some b/kb of RAM and 0.002% of CPU? This should be standard and not an option. ORM is heavy. With this always on, we can resolve this issue without greater problems. In the real world it is easier to spin up a new PHP worker, than an additional DB.
  3. If the dev needs to call something before, then why use relations at all?
rudiservo commented 1 year ago

I might have a solution for this issue, I will propose for it to be optional because it everyone has to add code.

Just for everyone to understand, there is no magical way for the system to know what model have changed properties in them not without either the dev marks that something has changed, or the system has to forcefully compare with prior data.

A save() will always trigger a cascading event of saves on all it's relations, because has I stated before, it does not know who was changed.

My proposal will be that devs that have relations that change have to set them

i.e.

$b = $a->b;
$b->name = "something";
$a->b = $b;

Having an array with changed Model breaks prepareSave().

this is extremely complicated.

lets say we do it

$a->markDirty();

And it's in a an array in the manager, then we say, manager->save(), so he goes to the array and saves them. Let say just the models in the array, then the relations do not get persisted unless you dictated all the affected relations, then we do not know who gets to persist first. It will cost more and will get us nowhere.

We have to follow the chain of relations, either mandatory or marking the relations for the save to follow, it's unfortunate, but it's how this works.

I will also propose to add a saveOne() function without any relation save trigger.

noone-silent commented 1 year ago

Just for everyone to understand, there is no magical way for the system to know what model have changed properties in them not without either the dev marks that something has changed, or the system has to forcefully compare with prior data.

But we have this already? Like I said, the default should be dynamicUpdates and keepSnapshots enabled. Then we KNOW what have changed in every single Model and saving can be skipped. If any of this is disabled, then what we have now as a bug, should be default behavior, because we DON'T KNOW the state of the models and just save them all

rudiservo commented 1 year ago

@noone-silent I can make those changes, but that is not for me to decide, project maintainers have to decide if it is a bug, then it is the default, otherwise it could be a breaking change and that would require a new major version. For what is worth, Java-Hibertante and PHP-Doctrine keep a snapshot and do dynamic updates mandatory, it's not even an option.

FYI Checking for dynamic updates costs a lot, I do think it should be the default, but if you want performance the developers should "tag" the model has dirty, that way it would only do the DU if it really is dirty.

Magic != performance.

niden commented 1 year ago

This has been one of the most painful areas of the framework to be honest. This particular discussion goes back to even v3. When to update? That is the question.

Personally, I don't let the ORM update things for me, I do the updates where I need them to, wrapping the whole operation in a transaction. But that is just me.

I like the idea from @noone-silent. We can go with that and set the behavior for updates once and for all.

rudiservo commented 1 year ago

@niden DynamicUpdates and snapshots by default or mandatory?

FYI, this is just quick fix, still I my advice would be to only run save on relations that we set i.e. $a->b = $b (WIP), I will create an issue for this.

niden commented 1 year ago

I would say default. Developers can then change the behavior if they need to.

niden commented 1 year ago

Resolved in https://github.com/phalcon/cphalcon/pull/16400

Thank you @rudiservo

noone-silent commented 1 year ago

Does it really fix it? This is just the dynamic update enabled by default. The check in collectRelatedToSave what we discussed before is still needed or not?

rudiservo commented 1 year ago

@noone-silent Yes.

The collectRelatedToSave will call the relation but the save will not trigger a SQL update. In order for the collectRelatedToSave not to grab all relations we need to agree on how will the dev flag that a specific relation has changes to be checked and only those will be returned by collectRelatedToSave. In my opinion this is needed for control and performance, but it should probably be new issue.

noone-silent commented 1 year ago

Then this issue is not fixed. This issue is about saving a model will also save all related models. Not about making a config setting as default. The config setting as default is just a side effect needed for this ticket.

@rudiservo You complicate things too much here. There have been made many supposed fixes for this already, all using the dynamic update setting. You have concerns about performance, but performance is not the main aspect someone uses ORM. We use ORM because of convenience. If we need pure performance or have hundreds or thousands of queries, we should act like @niden does and do transactions directly, or if we only work like this, don't use ORM at all.

Nobody is using 200 relations and all these relations also use relations of the hundreds. Yes, then performance and even resources would be a problem. But not in those simple use cases we have here.

rudiservo commented 1 year ago

@noone-silent it's not fixed? OK, sorry but I do not understand what is you expectation of it being fixed? You want for the ORM to check for changedFields in the collectRelatedToSave() or not to collect them at all? Because checking for fields on collectRelatedToSave or in a save() is about the same performance cost!

I can not really just bluntly check for changed fields on the first level of relations because the ORM does not know how deep in the chain relation anyone made a change to one model. If and that is a big IF, this change was made, how do you fix the problem of the ORM not checking the whole relation chain for changes? Because it will break someone's code!

You got $A->B->C, a change was made in C but you save A, C does not get saved, how do you fix it honestly?

@rudiservo You complicate things too much here.

Thank you I guess? Yeah I do that. Honestly I do think you oversimplify things too much here also, the ORM internals are hard by default, right now it's complex and has a lot issues.

Simple != Easy Complex != Hard

Nobody is using 200 relations and all these relations also use relations of the hundreds.

Do not get this the wrong way, but think about it from the maintainers point of view, is your statement a fact, personal observation or a guess?

I would love to make changes on what I think I need and no one else is using, I can't propose those changes without a really good justification.

BTW sometimes I use 20 relations with 5 relations with 3 more relations and 1 more at the end, why? Bureaucracy and complex business logic that the customer needs, that if I where to manually do it in SQL statements it would cost me more time to develop without any real gains in performance, I did try, and to make matters worse sometimes the customer can be unforgiving in deadlines.

Look if you have any idea how to fix it, do a PR like I do, the maintainers would very much appreciate that, it's actually quite fun and a good learning experience.

You can look at Doctrine, Hibernate and Propel for ideas and how they solved their problems, it's a lot of code but it's fun to read.

Also would advice you to read this blog post from a Zig maintainer, it is quite insightful. https://kristoff.it/blog/simple-not-just-easy/

noone-silent commented 1 year ago

@noone-silent it's not fixed? No it's not. Does the save work now like before or is it still saving all related models even without change? It is still saving all related models. And this issue/ticket is about that.

OK, sorry but I do not understand what is you expectation of it being fixed? You want for the ORM to check for changedFields in the collectRelatedToSave() or not to collect them at all? Because checking for fields on collectRelatedToSave or in a save() is about the same performance cost!

I want that save only is saving stuff that needs to be saved. Like before. This is what this issue is about. The performance impact now is much bigger than any change of checking the chain of responsibility will ever create!

I can not really just bluntly check for changed fields on the first level of relations because the ORM does not know how deep in the chain relation anyone made a change to one model. If and that is a big IF, this change was made, how do you fix the problem of the ORM not checking the whole relation chain for changes? Because it will break someone's code!

But you have to. How else you want to check the chain? And why would it break the code. The code is breaking right now, because it does not work like people used it to be or expect it to be. You check the first level and give it down the relation chain and in the end you have all models you need to save. $A->B->C->D->...->Z would be solved with this.

You got $A->B->C, a change was made in C but you save A, C does not get saved, how do you fix it honestly?

You do not stop on the first level. You give it down the relation chain. It's a chain of responsibility, triggered with the command save and needs to go down the whole chain and call save on every model that is part of this relation.

@rudiservo You complicate things too much here.

Thank you I guess? Yeah I do that. Honestly I do think you oversimplify things too much here also, the ORM internals are hard by default, right now it's complex and has a lot issues.

It was not an attack or anything. Don't get me wrong.

Simple != Easy Complex != Hard

Nobody is using 200 relations and all these relations also use relations of the hundreds.

Do not get this the wrong way, but think about it from the maintainers point of view, is your statement a fact, personal observation or a guess?

I fixed my code with this. Every model is now calling collectRelatedToSave() with the getChangedFields method and guess what? The performance is back to normal.

I would love to make changes on what I think I need and no one else is using, I can't propose those changes without a really good justification.

BTW sometimes I use 20 relations with 5 relations with 3 more relations and 1 more at the end, why? Bureaucracy and complex business logic that the customer needs, that if I where to manually do it in SQL statements it would cost me more time to develop without any real gains in performance, I did try, and to make matters worse sometimes the customer can be unforgiving in deadlines.

Yes, I do that too. We have a billing and invoicing system. A lot of relations there.

Look if you have any idea how to fix it, do a PR like I do, the maintainers would very much appreciate that, it's actually quite fun and a good learning experience.

The fix of it has been proposed many times here already. People implement it already by themself, because this issue is a breaking one. If your code was saving in 250ms before it is now up to 2s. That is breaking.

I will look into it.

rudiservo commented 1 year ago

@noone-silent It's not working has before because someone submitted an issue, v4 broke behaviour that was on v3 and prior, and I've using Phalcon since v1 and honestly skipped v4.

Here is the issue. https://github.com/phalcon/cphalcon/issues/15148

DynamicUpdate is better than getChangedFields, look at the code! In fact I don't even know if getChangedFields is reliable if you compare the code, I have to recheck what is going on. it should have always been in the first place to save the DB from getting hammered.

I want that save only is saving stuff that needs to be saved. Like before. This is what this issue is about. The performance impact now is much bigger than any change of checking the chain of responsibility will ever create!

We can't have it both ways, save all relations and not save all relations without any input from the dev that triggers either.

What you are proposing is a breaking change, that means a v6, and honestly if we are to make a v6 this is definitely not the way to fix this because it is also flawed.

In Example, the way Doctrine 2 works is opt-out with a readOnly flag, that does not mean it wont go rampant all the way on all relations, yes it's heavy!

You have to opt-in opt-out, it can not be none and things just work out of magic.

I on the other hand think that it should be opt-in, you flag that a relation chain has changes, readOnly flag is if you only want to check relations and skip any update checks, more work for the dev, but far more control over the dataflow.

IMO this is the better solution, it's really good for performance and I stand by it, if you didn't care about performance this would not be an issue for out.

This is what I have been trying to explain to you and why I needed to know what was your expectation.

This change has to be well thought for ramifications, it has to pass the tests and create new ones, it's a ton of work.

Now look, the old behaviour has flaws, the current behaviour has flaws, honestly both are bad, either we can agree on a new behaviour that really fixes the issue once and for all and hopefully the maintainers agree to be the default behaviour, otherwise, it's a setting, not that bad IMO.

You can check my proposal, the relations get flagged with changes WHEN you set them or by another type of opt-in behaviour.

It was not an attack or anything. Don't get me wrong.

I didn't take it like one, you just come out like you don't seem to understand the ramifications and caveats of a complex system like the ORM, I understand you want it fixed for your use case and it's frustrating, but the team can't do it without screwing up someone else, concessions have to be made to make this work otherwise it's not a reliable ORM and we can't please both sides with magic.

But you have to. How else you want to check the chain? And why would it break the code. The code is breaking right now, because it does not work like people used it to be or expect it to be. You check the first level and give it down the relation chain and in the end you have all models you need to save. $A->B->C->D->...->Z would be solved with this.

You really didn't understand, if you don't return B, B never calls C, C never calls D... Z, it wouldn't honestly, I don't how it did before without some opt-in or an overload function to make it work.

You do not stop on the first level. You give it down the relation chain. It's a chain of responsibility, triggered with the command save and needs to go down the whole chain and call save on every model that is part of this relation.

That is what is doing right now, it did not do before!

Your solution of not returning that model in the collectRelatedToSave, actually breaks that, think about it! if you don't change B and set it, B never calls C.

I fixed my code with this. Every model is now calling collectRelatedToSave() with the getChangedFields method and guess what? The performance is back to normal.

It's a BAD FIX, IF you are not setting the relation or trigger save() on relations in the middle of the model, specially the belongsTo and hasOne, you are going to get unsaved data and a lot of issues, either way, IT'S A BAD UNRELIABLE FIX and it will come back to bite you, I've been there!

I will propose my opt-in solution anyway, it probably has to be a new issue because it's going to be a long conversations with a lot decisions and it's needed for the management of the project, also the maintainers have to accept it. Sorry but it is what is it, no magic bullet, just really hard software engineering.