PUT on entity with self links causes changes to be overwritten with old values [DATAREST-238]

spring-projects-issues commented 10 years ago

Thomas Ferris Nicolaisen opened DATAREST-238 and commented

When PUTting the following JSON (a "Workload" entity):

{
  "version" : 1,
  "weight" : 123,
  "handlingMinutes" : 2,
  "articleCount" : 3,
  "address" : {
    "line1" : null,
    "line2" : null,
    "line3" : null,
    "postCode" : "53123",
    "city" : "Bonn          ",
    "countryCode" : null,
    "street" : "Bonner Bonnweg",
    "houseNumber" : "125"
  },
  "links" : [ {
    "rel" : "self",
    "href" : "http://127.0.0.1:8000/api/workloadDao/1060"
  }, {
    "rel" : "area",
    "href" : "http://127.0.0.1:8000/api/workloadDao/1060/area"
  }, {
    "rel" : "product",
    "href" : "http://127.0.0.1:8000/api/workloadDao/1060/product"
  } ]
}

.. the incoming JSON is converted all fine and well, until it runs into the links/rel/self/href thing, and hits this line: [1]

This causes the converted Workload entity from the PUT call to be overwritten with the old values. End-effect: A PUT looks like it is successful, but no changes are stored.

[1] https://github.com/spring-projects/spring-data-rest/blob/2.0.0.M1/spring-data-rest-webmvc/src/main/java/org/springframework/data/rest/webmvc/json/PersistentEntityJackson2Module.java#L135

Affects: 2.0 M1 (Codd)

Reference URL: https://gist.github.com/tfnico/37d24e841e63af2397ee

Issue Links:

DATAREST-143 PUT of entity with links fails.. ("is duplicated by")

Referenced from: pull request https://github.com/spring-projects/spring-data-rest/pull/130, and commits https://github.com/spring-projects/spring-data-rest/commit/ef1ee10940a17f3bd1e30e39cf054ab95f53f47e

spring-projects-issues commented 10 years ago

Nick Weedon commented

Its interesting that this happens. I was reworking the code in that area (in a private fork of SDR) and I couldn't make sense of the code I saw. I figured that this bug that you see would be the result.

I am just in the process of adding unit tests for a fix that will address this issue (as well as allowing Jackson annotations to be 100% compatible with SDR).

Once i submit the pull request, it might take some time for the fix to be reviewed though since I know that the SDR developers are pretty swamped with features and issues at the moment.

Assuming we are not both somehow mistaken, its good to know I wasn't going crazy by thinking that the existing code was a bit broken in this area :)

spring-projects-issues commented 10 years ago

Greg Turnquist commented

I've tracked down the issue. Essentially, I had to cut out that deserializer. Deserializing a JSON object should simply ignore the _links given that they are read only data and the default deserializers can do it with an adjustment to the configuration

spring-projects-issues commented 10 years ago

Greg Turnquist commented

Fix for this issue

spring-projects-issues commented 10 years ago

Nick Weedon commented

I actually already have a fix for this, only my fix actually does not ignore the _links section, instead, the linked associations are included in the deserialized object.

The self link is a special case and the ID is extracted from the self link and used as the id field of the object (an exception is thrown if an ID is already provided). This is consistent with the GET behavior since the self link can be used as the ID in this case.

The way that my deserializer works is as follows:

Create a token buffer
Recursively parse the incoming request using JsonParser
Add all non '_links' tokens to the token buffer.
Convert _links entries into json token format (using existing code to locate the correct repository and call findOne) and add to the token buffer as json tokens.
Take the token buffer and deserialize this as a regular JSON object. This allows all Jackson annotations in the domain class to still be honored.

This recursive parser works with link arrays and _links sections within inlined objects.

For the serializer side, I have also performed changes (i won't go into detail here) to allow the Jackson annotations in the domain class to be honored when serializing.

I should hopefully have it submitted today, I am just finishing off some unit tests.

spring-projects-issues commented 10 years ago

Nick Weedon commented

Here is the fix i mentioned (https://github.com/spring-projects/spring-data-rest/pull/133). Note that this uses the 'draft-kelly-json-hal-06' IETF draft Hal style (the style that appears to be used in GETs in the latest SDR release).

spring-projects-issues commented 10 years ago

Thomas Ferris Nicolaisen commented

@Nick Do you want me to build/test on that PR?

spring-projects-issues commented 10 years ago

Nick Weedon commented

You certainly can be but beware that it has not yet been reviewed by the SDR developers and so it may not make it into a release.

I am basically just putting forward a proposed 'fix'.

As long as you are potentially prepared to repatch with these changes then go for it. Otherwise I would wait until it gets reviewed

spring-projects-issues commented 10 years ago

Greg Turnquist commented

Thanks @Nick for your effort.

The fix I have attached in a link above in a separate PR basically ignores the _links attribute when doing any sort of an update. That allows the update to focus purely on the data structure. This also has the beneficial side effect that it doesn't matter if its called _links, links, _foobar, or whatever. Any attributes that we add on when rendering the JSON that are additive are essentially ignored when you turn around and push it back to update/replace the attribute. And since we can glean id numbers and or property name based on URIs, we don't really need to fetch such URI components from the _links section.

If there are other parts of PR 133 that needs to be evaluated, can you comment on their respective JIRA issues so we don't lose track of that?

spring-projects-issues commented 10 years ago

Nick Weedon commented

Oliver has kindly explained to me the reasons behind moving away from processing the _links sections, particularly in the area of updating existing relations. Understanding this now, i can see why you ignore the _links section.

I have gone through the Jira issues and commented on those that i believe my fix is applicable to. You can now of course ignore my comments on Jira issues that relate to _links processing.

The focus of my change was supposed to be focused on allowing JSON serialization and deserialzation to work without ignoring Jackson annotations on the domain class. I kind of got a bit side tracked on the _links functionality when I saw some existing behaviour there. My fix does still address Jackson annotations however. I believe that the fix you have made will also allow Jackson annotations to work since the entire domain object will now be processed by the default jackson deserializer.

I would recommend against ignoring all unknown data fields however since silent failure can have dire consequences, particularly when dealing with data. Consider the case where a REST client is updated with a new data field but the REST server has not been updated. In such a case data will be lost if this new field is ignored and it is better to reject such a request with a http error. Ignoring known decorative fields such as a _links 'self' field is safe because the intent of the update is still understood. By ignoring all fields on the other hand, there is an opportunity for data corruption to occur.

spring-projects-issues commented 10 years ago

Greg Turnquist commented

@nick

I took your advice. Check PR 130 and see how I replaced ignoring ALL unknowns with a handler to only ignore _links

spring-projects-issues commented 10 years ago

Nick Weedon commented

That looks good. Looks like a nice clean fix for the issue at hand.

Thinking forward on this issue (and what I am about to say really belongs in a different but related ticket), i am wondering why the _links field was previously being ignored instead of allowing an error to occur? Is that for reasons of backwards compatibility or some Hal client perhaps? I would think that if anything other than a 'self' link is provided then the result would probably not be what the REST client expected. Maybe a 'developer friendly' exception should be raised in such a case, explaining that Hal fields are not support in POSTs/PUTs?

Anyhow, like I said, these thoughts probably belong in another ticket. I think your fix addresses the issue well

spring-projects-issues commented 10 years ago

Oliver Drotbohm commented

The PR generally looks good to me. I'd vote to simply ignore all unknown properties for now as a dedicated ignoring has to be media type specific (_links might be a perfectly valid property in non-HAL representations), which we currently cannot take into account.

Also, in a REST model, the server drives the interaction and defines which data it will accept. Hence the client being "newer" than the server is a pretty uncommon (if not disallowed) scenario

spring-projects-issues commented 10 years ago

Nick Weedon commented

Looking at the old code again, it actually looks like the code was in fact previously ignoring all fields already. The funny thing is that the checks that are done to see if the property name is "links" or "rel" are completely redundant since the field is ignored if there is no persistent property of this name. In fact the only time that the if("links".equals) would even execute would be in the case where you actually have a persistent property named links??? I think that this broken code tricked me into thinking that it was only ignoring these fields.

Anyhow, that aside, given that this was the previous behavior, I agree that it is better to maintain this behavior since changing it does not relate to this ticket. I therefore apologize for the bum steer (Australian expression :] ).

For any final/releasable version however, my advice still holds. Although you mention that the client being "newer" than the server is pretty uncommon, in my experience this has happened quite frequently. The situation often rears its ugly head when you have a server product that acts as a client to another server, particularly in large enterprise systems that span multiple time zones and that are therefore often governed by different groups of people. In such a setting it is quite frequent that one group will upgrade to a newer version of the product before another group and in a totally chaotic and haphazard manner :) It is also quite frequent in the case of rolling upgrades when dealing with high availability (i.e. 24/7) servers where the servers are exchanging information between one another (such as when replicating data through REST due to the servers using heterogeneous data storage, for instance one being integrated into Active Directory via LDAP and the other using an Oracle database).

There are also various permutations of this problem involving new or removed fields emerging in new client or server versions of a product.

When dealing with data corruption, particularly when the victim is a large international company such as HSBC or Deutsche Bank for instance (i.e. one that can bend you over with a large gang of lawyers), telling them that it is a disallowed scenario doesn't really help anyone. Data corruption is the worst kind of bug and it should be avoided at all costs, even if the situation is uncommon.

Sorry for the large sermon but I just wanted to elaborate on some of the kind of scenarios where this can occur.

spring-projects / spring-data-rest

PUT on entity with self links causes changes to be overwritten with old values [DATAREST-238] #621