Deep Diff checking is not working

ragingdave commented 4 years ago

When using a package like spatie's own schemaless-attributes, the extra attributes are logged as being different every touch of the model, which not only generates useless logs even when using $logOnlyDirty but just creates confusion in the activity log itself. While specifying specific json attributes is supported, that functionality isn't carried over into the dirty checks to confirm that what's going into the activity log is only specifically changed json attributes. Additionally, the reverse filtering that normally works isn't applied either (being able to set logAttributes = *, and then exclude attributes from ever being logged).

It seems that there was quite a bit more that needed to be done to add json attribute logging that what's currently there. On top of that, in enabling that functionality or really handling any deep diffing is completely unsupported.

Gummibeer commented 4 years ago

Hey,

could you add some code examples - so like what you fill in your model, how you've configured (want to configure) the activitylog and what you would expect to be logged.

I agree that the JSON support isn't as complete as the base attributes.

With knowing about your expectations it will be easier to add/support (some of) them.

ragingdave commented 4 years ago

Specifically I am using this in combination with Laravel Nova and custom JSON field which ends up having to update all the attributes of the extra_attributes that is the schemaless-attributes package.

In this way the extra_attributes I presume is made dirty, but normally in this case, the standard attributes are dismissed from the activity log using the array_diff_uassoc call in this package to handle the log only dirty functionality.

I also have an import from data files for this so the call is basically just:

Location::updateOrCreate(
    ['id' => 123],
    ['extra_attributes' => <ALL CURRENT AND UPDATED EXTRA ATTRIBUTES AS ARRAY>]
)

SO more concretely: Location Model configured with

$logOnlyDirty = true;
$logAttributes = ['extra_attributes->phone', 'extra_attributes->details'];

And then usage like:

// extra_attributes before the update
// ['details' => '', 'phone' => '1231231234']
Location::updateOrCreate(
    ['id' => 123],
    ['extra_attributes' => ['details' => 'Something new', 'phone' => '1231231234']
)

This will also set the whole extra_attributes as being "changed" when it's potentially really not changed at all. There are more things stored there than just a single property but I figured this being a relatively simple case would be easiest.

Gummibeer commented 4 years ago

Okay, so if you define the JSON sub-keys in the logAttributes and enable logOnlyDirty you would expect two things:

the main attribute is only in the logged attributes if one of the JSOn sub-keys has changed.
the main attribute only contains the defined sub-keys even if it holds more keys

$submitEmptyLogs = false;
$logOnlyDirty = true;
$logAttributes = ['extra_attributes->phone', 'extra_attributes->details'];

$location = Location::create([
  'name' => 'Hamburg', 
  'extra_attributes' => ['details' => '', 'phone' => '1231231234']
]);
// the log will contain both keys because it's a new record
// * extra_attributes->phone 
// * extra_attributes->details

$location->update(['name' => 'London');
// no log because the name is not tracked

$location->update(['extra_attributes' => ['details' => 'new details']]);
// the log will contain only the details sub-key because nothing else is changed
// * extra_attributes->details

$location->update(['extra_attributes' => ['phone' => '1231231234']]);
// no log because the phone has not changed

$location = $location->update([
  'name' => 'New York', 
  'extra_attributes' => ['details' => 'new details', 'phone' => '987654321']
]);
// the log will contain only the phone sub-key because nothing else is changed or tracked
// * extra_attributes->phone

Do these examples describe your expected behavior?

ragingdave commented 4 years ago

That seems to sound correct yes. I would also just tack on that if logAttributes was set to * and still logOnlyDirty, then it should still have the same effect as you've described with the only addition. I'm assuming that's an included case, but just figured I would make sure it was taken into consideration as well.

Gummibeer commented 4 years ago

Because a full deeplevel comparison and possible different expectations I will exclude this case. Reason: the attribute has changed and there's not a more precise scope to log only a dirty subkey. The goal is to log the full, loggable, attribute if it has changed. Otherwise we will have to handle too much edgecases. To demonstrate a similar case: we log the whole datetime/carbon and not only the changed part (hour, minute ...) of it. But I will check what we can do. It could be that it will get into a new major version because the whole logging, detecting is super complex already and most names aren't perfect.

ragingdave commented 4 years ago

While I understand the number of random edge cases that could come up, wouldn't the same logic apply whether the logAttributes were set to * vs just the field that's json? Additionally, then if you wanted to log all the json attributes, then you would have to specify all the json attributes in the logAttributes, making the feature less than ideal. If you are talking about just subkeys for now and then handling all in a future release then cool, but if it's just overall excluding the * case, then that's a different story.

Your example of using Carbon as a change while I see how non-trivial the detection of when to break out edge cases could occur. However it doesn't have to be an all or nothing scenario...take for example tracking all json attributes like an all, perhaps a mid point can be needing to specify:

$logAttributes = ['*', 'extra_attributes.*'];

which would trigger an object level deep diff check without needing to cover every single deep diff case, and would be at least better than having to list off like 10 attributes all prefixed with the base json attribute like extra_attributes->attribute1, extra_attributes->attribute2, etc.

nagi1 commented 3 years ago

Okay based on @Gummibeer replay above

And a kind explanation from him I quote:

So the real request is that

we already support (endless) deep level JSON attribute definition for the loggable attributes. But these sub-keys aren't applied for the "only dirty" configuration. So, if possible, these sub-keys should be applied to filter the real attribute.

I see A LOT of edge cases and it's also very likely breaking - or better it as is the underlying behavior changes. The edge/test cases are primary about the typical conflicting values in PHP like 0, null, [], false and so on.

I tend to agree more with @ragingdave about meeting in mid point and only enables deep diff checking only and only when explicitly define it on $logAttributes attribute.

i.e.

$logAttributes = ['*', 'extra_attributes.*'];

@ragingdave if you can spare some time I'll be working on this issue this week end :)

Gummibeer commented 3 years ago

The wildcard is a wildcard and will/should log EVERY and FULL changed attribute. So extra_attributes.* is redundant in the above example.

The deep checking should only happen for explicit keys. So even *, address->street will log the whole address when the zip code changes as there's a wildcard.

ragingdave commented 3 years ago

The wildcard being a wildcard sure, I get that. What about when you have logOnlyDirty set to true? Would that enable deep diffing or would it just be like it is now and log the whole json attribute? If it will just log the whole thing, then the friction for the dev to have to specify a whitelist of keys to log, because otherwise it won't log the diff in the json, then it's not really worth it.

I would think that the wildcard would apply hierarchically, so that in the spirit of the wildcard + logOnlyDirty, it would recursively log everything, but only things that changed. So if there's no changes to the keys of the json property, then it wouldn't log it.

Maybe we are just talking about different cases though because that cut and dry where it sounds kind of terrible from a DX perspective for wildcard is wildcard.

Gummibeer commented 3 years ago

The dirty check would be on the listed attributes. Running deep diff per Default to the deepest possible level would lead to a lot of unexpected results.

If the deep diff only on listed attributes isn't what you want or what would solve your issue we can close this issue. Since Laravel 7 custom object casts are a thing this would be even more complicated and except of the mentioned schemaless attributes scenario an attribute is one attribute as a whole. As it's about the mentioned package: this could be a fit for a new label and things we think about. Other packages compatibility bridges.

nagi1 commented 3 years ago

I suggest we introduce a new attribute to enable deep diff checking on only JSON attributes, set aside any other custom object casts since there's no demand for it at least for now.

Because I believe that logging only changed attributes on a JSON would be very helpful in so many situations like in #647.

The new attribute would called something like $logJsonSubKeys, $enableDeepDiffForJson or $jsonSubKeysDeepDiff, of course we still have to respect $logAttributes and $logOnlyDirty.

If the no json attribute(s) found in $logAttributes - respecting the * wildcard - it wont be logged even if the $logJsonSubKeys is enabled.
If $logOnlyDirty and $logJsonSubKeys is enabled only dirty json sub keys will be logged.

That would limit the random edgecases and make a reasonable amount of testing for such behavior. As for the wildcards being a wildcards, sure, we can get around this by introducing a new attribute (flag) I would say to only enables deep diff checking only when needed.

ragingdave commented 3 years ago

I actually had a completely different approach here, that might end up resolving the issue (maybe?) much simpler than recursion or having to denote nested keys.

Instead of treating json as json, why not just convert the properties of the model to dot notated versions, then run the diff check as is, then un-dot them. This would effectively treat the entire property set as a single level key object and prevent any recursion.

So for example, a model using extra attributes would have an array of properties on top of the model properties: (yes I know this is bad practice and you would never use schemaless attributes in this way because oh boy it's bad, but off the top of my head here it is)

Book: 'pages' => 42 'title' => 'The Book'; 'isbn' => 12345678900, 'extraAttributes' => [ 'price' => 43, 'author' => [ 'name' => 'Mark Twain', 'age' => 31, ], ]

So normally you'd potentially have to recursively check diff on 3 levels to actually achieve what is being wanted here. Instead using Arr::dot, you'd wind up with a single level property set like: Book: 'pages' => 42 'title' => 'The Book'; 'isbn' => 12345678900, 'extraAttributes.price' => 43, 'extraAttributes.author.name' => 'Mark Twain', 'extraAttributes.author.age' => 31,

After the diff checking, you just undot the properties and bob's your uncle! In this way the only sort of trouble added is just to dot and undot the properties before and after comparison.

Here's an example undot macro as well:

        Arr::macro('undot', function ($arrayDot) {
            $array = [];
            foreach ($arrayDot as $key => $value) {
                Arr::set($array, $key, $value);
            }

            return $array;
        });

I think it's a rather elegant solution if I don't say so myself ;)

There's probably something I'm not thinking of, but this way would make potentially a non-breaking change, and would potentially require no additional properties or configuration to put in place.

Gummibeer commented 3 years ago

@ragingdave that's a good idea to dot the array! 🚀 @nagi1 already works/has nearly finished v4 which includes a plugin/hook/pipeline way to do whatever you want with the changes. We won't provide official pipelines (yet) but add some to the docs as ideas. The idea is to manipulate the changes however you want in a pipe. That way we can also add support for packages like laravel translatable and other packages with more complex data structure.

nagi1 commented 3 years ago

Hey @ragingdave, I just love your elegant solution; I think it'll be a great example to add to the test suite and documentation!

As @Gummibeer has mentioned above the new pipeline approach will provide so much flexibility to create even more elegant solutions for so many problems.

In the mean time here you'll find an example of json diff, I know it's so basic, would you please PR an example with this elegant solution using pipeline to include it in the test suite and docs.

Thanks!

Gummibeer commented 3 years ago

I will close this issue even if v4 isn't released yet. But the task itself is done and I want to check which tasks are really open. Please keep an eye on #787

amrshakya commented 2 years ago

@Gummibeer Is deep checking available in v4 release? I did not see it in the documentation.

Gummibeer commented 2 years ago

@nagi1 already works/has nearly finished v4 which includes a plugin/hook/pipeline way to do whatever you want with the changes. We won't provide official pipelines (yet) but add some to the docs as ideas. The idea is to manipulate the changes however you want in a pipe. That way we can also add support for packages like laravel translatable and other packages with more complex data structure. https://github.com/spatie/laravel-activitylog/issues/692#issuecomment-816261183

@amrshakya we do and we don't. 😂 There's no official/simple way/config. But using the new pipeline system you can manipulate the logged changes however you want.

spatie / laravel-activitylog

Deep Diff checking is not working #692