scandihealth / lpr3-docs

https://scandihealth.github.io/lpr3-docs/
MIT License
11 stars 7 forks source link

PROD: Race condition error: Prior Document Contact Already Marked as Successful #319

Open Marc-Petersen opened 5 years ago

Marc-Petersen commented 5 years ago

Following is an example of a race condition, which generates an error on the service endpoint:

  1. AMS 27996604 (version 9) is built stating that it is replacing document version 8. This was an update to the Pathway Element (system HAR 3800000009280).

  2. AMS 27996609 (version 10) is built also stating that it is replacing document version 8. This document was for HAR 3800000009279. <--This is a result of the race condition described in an earlier post.

  3. DXC responds to version 10 with a success status in AMS 27997072.

  4. DXC responds to version 9 also with a success status in AMS 27997074.

  5. AMS 28381801 (version 11) is built stating that it is replacing document version 10, which is the most recent successful DXS record contact per Epic.

  6. DXC responds to version 11 (in AMS 28381802) with an INTEGRITY_CHECK error stating that the most recent document version should be version 9 (presumably because the version 9 document was processed by the service endpoint after the version 10 document).

Resolution path: What we'd expect from the service endpoint in step 4 is that document version 9 should get rejected with an error stating that the correct document to replace is version 10, not version 8 (which has already been replaced by version 10).

If further documentation is needed in form of examples or other, simply post requests.

TueCN commented 5 years ago

I am not familiar with the terms AMS and HAR, but I assume the AMS numbers do not have semantic meaning? (I don't see how the numbers are related).

Anyway, you are correct that LPR should respond with an INTEGRITY_CHECK error at step 4.

I have reproduced the issue and I am looking into it.


Severity: My initial analysis seems to suggest that this bug cannot result in data loss. Due to database transaction isolation, the 2 concurrent updates cannot both successfully commit updates to the same rows. The database will roll back the transaction that commits last (and LPR will return a SOAP fault saying: javax.persistence.OptimisticLockException: Row was updated or deleted by another transaction

This means that the scope of this issue seems to be limited to LPR not being compliant in that it does not fully respect the rules outlined in https://scandihealth.github.io/lpr3-docs/aspects/index.html#documents-and-versioning. It allows non-conflicting appends to not-current version documents when processing document updates concurrently.

We will look into making the service instead return the intended INTEGRITY_CHECK error in this situation.

TueCN commented 5 years ago

RESOLUTION LPR now serializes concurrent updates to the same set. This means that whichever of the concurrent updates that first "locks" the set wins, and the other requests must wait until that update is complete before they can continue.

This means any subsequent update that did not expect the first update (like in this issue) will fail with an INTEGRITY_CHECK RegistryError: PARENT_DOCUMENT_ID_MISMATCH.