oslc-op / oslc-specs

OSLC OP specifications and notes
https://open-services.net/specifications/
25 stars 10 forks source link

TRS WD 2021-05-19 Part 1 review comments #513

Open berezovskyi opened 3 years ago

berezovskyi commented 3 years ago

Part 1

ndjc commented 3 years ago

I will start addressing these comments in a new branch b-trs30-draft3, as discussed in the meeting on May 20th. I do not expect all to be resolved in that new draft, so I've added a checklist above, and will mark off the mainly editorial actions as I make them. I will probably leave some open to be discussed later - the topic of discovery, root services, etc. is one of those.

berezovskyi commented 2 years ago

@jamsden do you still want to go over these items?

jamsden commented 2 years ago

Yes

jamsden commented 2 years ago

We are about to publish trs-v3.0-ps01. @berezovskyi, are the any of these items that should be addressed in ps01?

berezovskyi commented 2 years ago

Jim, I am on vacation this week. The most important issue is to remove the language that allows items to appear out of order, e.g. 5 7 8 9 6 (ie 5 on the first request, 5 7 on the second, then 5 7 8, then 5 7 8 9 and finally 5 7 8 9 6). This may be issues 21, 27, 30 but I am not sure unless I cross-check with the draft.

Our client triggers a full rebase every time it encounters such a SNAFU. Removing text from the spec that condones such behavior would allow users to go their vendors and file bug reports asking for events not to be published out of order but at the same time clients will still have the code that works around this.

berezovskyi commented 2 years ago

Note that this has nothing to do with another issue I have that 5 7 8 9 10 is allowed. Better to have skips like 5 7 8 9 10 than OOOE like 5 7 8 9 6.

berezovskyi commented 2 years ago

Some systems deal with this through watermarking, see https://databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html but we don't want to go there. At the very least, we need to ensure that the order of trs:order matches the TRS log append order (no pun intended).

I recall David and Nick arguing that it's "hard" to do in a clustered environment. As I said before, AtomicLong allows to generate 1MM+ TRS event order numbers per second and Hazelcast/Redis allow at least 300k atomic INCR commands across multiple nodes in a cluster on a decent server HW. I am pretty sure a SQL DB with an AUTO INCREMENT key will do the trick, even if 10x slower. I am 100% confident we can enforce both constrains but here specifically we need to deal with out-of-order appends.

jamsden commented 2 years ago

re: "language that allows items to appear out of order", I don't see any language that allows or implies that. Can you point to that language?

berezovskyi commented 2 years ago
  1. CC-25
  2. CC-23 is suspicious (it talks about changes, but it's not clear if we are talking about change time or changelog append time; the reorder of the former is fine but not the latter)
  3. CC-17 talks about "recency" of the events, not their append order. In its form it directly contradicts CC-25.

We should define two separate terms: TR event time and TRS event append time. And then restate those CCs, such that two TR events may appear out of order in the TRS (CC-23), state that TR events may be delayed from being appended to the TRS (CC-25) but once a TRS Client fetches the TRS log and observes a set of TRS events with a trs:order of up to n, no subsequent TRS log responses may contain appended TRS events with a trs:order < n that were missing from a previous response (CC-17). This would mean that if CC-23/25 happen, they will have TRS event order numbers as they are appended to the log, not according to the original TR event time order.

berezovskyi commented 2 years ago

For now the only spelled out guarantee is in CC-36, which says that when you get a new page, it will not contain events with an order number lower or equal to the max order number from a previous page.

jamsden commented 2 years ago

CC-25: states under what conditions a TRS server must report a change event, it doesn't address the ordering of the events in the change log.

CC-23: indicates that change log ordering is independent for different resources, the change log order is only relevant for changes to a specific resource - in oder to ensure the state of the resource is accurately reflected over time.

CC-14, CC-17: indicate that the trs:order number sequence must correspond to the actual sequence of changes to the trs:changed resource. But does not say anything about the order of these events in the change log. They might for example appear out of order if members of clustered servers appended to the log in an asynchronous manner.

The client however can always rely on the trs:order numbers, and should not rely on their order in the change log. If they want to process them in the right order, one way would be to sort them before processing them.

So I'm still not sure any change in the spec is needed.