lgessler / glam

(WIP) a webapp for language documentation
Eclipse Public License 2.0
40 stars 3 forks source link

Support safe concurrent mutation processing #5

Closed lgessler closed 7 months ago

lgessler commented 3 years ago

Mutations currently need to be handled one-at-a-time globally (still to be implemented at time of writing) because of race conditions that might occur: some mutations (example) perform reads before submitting a transaction. I've attempted to mitigate this by including matches, but I'm not ready to say something bad couldn't happen if concurrent mutations were allowed. The impact is "only" on performance (viz., write throughput), but this does set a hard upper limit on one node for production deployments.

To address this, first some more work is needed to understand the failure modes of concurrent mutation processing. The solution may involve making more use of subqueries.

lgessler commented 2 years ago

deftx provides a suitable solution for this and this issue could be closed if all dangerous operations were rewritten with it. The global mutation lock (which is currently providing write-concurrency safety) could then be removed.

lgessler commented 7 months ago

This basically should be fixed with deftx adoption across all mutations. A remaining concern though is that deftx functions do not perform all the data model validations (such as existence checks for entities) that are necessary. Additionally, because of the way XTDB transaction functions work, it's not easy to propagate exceptions out of them.

In light of this, it's probably best to just keep the current approach and leave the global write lock in place. It's hard to imagine throughput ever becoming a serious problem for this app.

lgessler commented 7 months ago

One example of an unsafe path without the global lock: https://github.com/lgessler/glam/blob/ddece77eabc31c5adb3d4b5ce5543cead12d6578/src/main/glam/models/span_layer.cljc#L60-L61

lgessler commented 7 months ago

In the future, it might be ideal to move everything into deftx. A really hacky way to propagate error messages out of there might go like this:

  1. Make a wrapper around submit-tx that inserts a tx op to the beginning of the tx vector that contains a nonce ID, e.g. {:xt/id 1334910247 :glam/nonce true}, and then deletes it at the end of the tx.
  2. Set up a global atom that maps from nonce IDs to error information.
  3. If a deftx encounters an error, it will swap the data into the atom.
  4. On tx failure, the code that invoked the transaction will know where to look.
  5. On tx success, the transient nonce document will have been deleted at the end of the transaction.
  6. The atom is cleaned up when the lifecycle of the request is done.