Allow serializing SM state

singpolyma commented 7 months ago

It is useful in some applications to get a snapshot of the stream management state for storage outside the process, in order to recover from crashes and other things. The get_sm_state already present is not suitable for this because it returns a live object tied in to the current context and such, and containing much unneeded internal-api data.

Introduce a new get_sm_serializable_state that creates instead a fixed snapshot, handing full memory ownership to the caller, and a dual set_sm_serializable_state which takes in the bare minimum needed values in order to resume and sets up a new sm_state based on these values.

sjaeckel commented 7 months ago

Thanks for the PR.

Exposing library internals is against the self-set rules of this library and therefore not mergeable. By exposing the struct members, this struct is from then on part of the API&ABI and changing anything in it would require a major version increase.

IMO it would be better if we were exporting everything for the user, maybe into a binary blob!? An API could look something like:

/* This takes ownership of `sm_state`. It allocates a new blob buffer,
serializes all the data contained within the `sm_state` and returns
an opaque buffer `blob` of len `blen`.
After serialization it first zeroizes, then free's `sm_state` completely.

The data can IMO be in an arbitrary format of your choosing, feel
free to propose something. It should be simple enough and robust.

The only feature that format requires from my side is "versioning". */
int xmpp_sm_state_serialize(xmpp_sm_state_t *sm_state, void **blob, size_t *blen);
/* Reverse the serialization */
int xmpp_sm_state_deserialize(xmpp_ctx_t *ctx, const void *blob, size_t blen, xmpp_sm_state_t **sm_state);

This doesn't mean that this blob must not be "text editor compatible", but my target is to take the burden away from the user to care about handling that struct. Doing that might be OK for you, but it won't be OK for others.

Regarding the "protocol" of the exported data I'm not sure what we should do ... Maybe use the SSH wire format? c.f. RFC4251 Ch.5, it's simple enough to be re-implemented and pretty robust, also it supports all features we require if I'm not mistaken. IMO we don't have to care about low-level details like e.g. endianness, we should expect this data to only be read on the machine that it was written from.

What do you think? Let's discuss this, before diving into another implementation.

singpolyma commented 7 months ago

Exposing library internals is against the self-set rules of this library and therefore not mergeable

That's why I wrote a new struct instead of re-using any of the internals.

As you say though, I don't really care how I get this data, so it doesn't have to be a struct. So long as I can get the 4 values I need: current value of sent counter, current value of received counter, current sm id, and currently queued stanzas.

I guess what you're saying is that technically I could structure my code to not know about those 4 values. Looking at what I'm doing I guess that's true since I always just store them as-is and retrieve them as-is. I can't really think of a reason I would need to introspect except debugging so maybe that's fine.

This takes ownership of sm_state

I definitely need to be able to serialize without taking ownership, since strophe needs to keep ownership in order to stay connected and keep updating the state as it goes. I call this a lot (on every sm state change, though I don't have events for quite all of them in strophe yet but it's on my todo list).

Regarding the "protocol" of the exported data I'm not sure what we should do ... Maybe use the SSH wire format? c.f. RFC4251 Ch.5

This could work. I propose instead using CBOR (RFC8949) -- it will be a few more bytes, but not more complex to generated, and can be parsed by more tooling and has an official encoding for lists of strings, which a main part of the data here. We would of course only accept our own output as input and not a full arbitrary CBOR parser. So the format could be something like:

version sent (Big Endian)           received (Big Endian)   id (max length 32)  queue (max length 32)
\x01    \x1a\x00\x00\x00\x01        \x1a\x00\x00\x00\x01    \x62id              \x81\x59\x00\x03</>

sjaeckel commented 7 months ago

I propose instead using CBOR (RFC8949)

It looks like we can trim this down to an acceptable complexity.

queue (max length 32)

Why would you limit to 32? I could understand to 24, but 32 requires to implement two types, 0x80..0x97 the array Major Type with the length included + 0x98 the array Major Type with "one-byte uint8_t for n".

From the cost of implementation complexity I would prefer to go with a "always non-preferred" way of encoding all required types as their 0x*a variant "four-byte uint32_t (follows|for n)". This would then basically be a type-tagged version of the SSH wire format, but I see your point of tooling to read that data.

Your example would then look as follows:

version                 sent (Big Endian)           received (Big Endian)   id
\x1a\x00\x00\x00\x01    \x1a\x00\x00\x00\x01        \x1a\x00\x00\x00\x01    \x7a\x00\x00\x00\x02id

queue
\x9a\x00\x00\x00\x01\x7a\x00\x00\x00\x03</>

FYI I've corrected the type of the queue entry from \x5* to \x7*, since we're always dealing with UTF-8 strings.

I definitely need to be able to serialize without taking ownership, since strophe needs to keep ownership in order to stay connected and keep updating the state as it goes.

I'm not sure whether I like that idea. The current way of only allowing the retrieval of the sm_state while disconnected is intentional, since otherwise we can't be sure whether that sm_state is valid or whether it's a stale one that had been updated internally already.

Maybe @pasis wants to add something to that discussion?

singpolyma commented 7 months ago

I'm not sure whether I like that idea. The current way of only allowing the retrieval of the sm_state while disconnected is intentional, since otherwise we can't be sure whether that sm_state is valid or whether it's a stale one that had been updated internally already.

So I'll give you a better overview of what I'm doing (currently doing this in js with xmpp.js but planning to add a native target that uses libstrophe):

Every time a stanza comes in, or goes into the queue, or gets confirmed by an ack, or is known to have failed after a resume, I snapshot the sm state and serialize it, storing in persistent storage. This allows me to hydrate the state and attempt a resume even after a full crash or other unceremonious termination. In most cases this allows me to resume saving lots of time and keeping the stream intact. In the worst case, something has diverged and the resume fails pushing me back to a vanilla reconnect that I would have needed anyway without this.

I do understand that as a consumer of this API it's my job to make sure I snapshot every time there is a meaningful change, otherwise my serialized state will be useless.

sjaeckel commented 6 months ago

Thanks for the explanation.

I guess we have to give up something for your use-case then, and I thought whether it wouldn't be better if it were more interactive? I'm thinking of a callback mechanism you can register which gets called each time we update our SM state.

My first idea looked like follows (NB: I'm not sure whether that's complete, I may have missed something :) )

enum xmpp_event {
   /* `h` == 0 -> enabled
    * `h` != 0 -> resumed
    * `data` == NULL
    */
   XMPP_SM_ENABLE,
   /* `h` == `sm_handled_nr`
    * `data` == NULL
    */
   XMPP_UPDATE_H,
   /* `h` == 0
    * `data` == stanza
    */
   XMPP_Q_ADD,
   /* `h` == 0
    * `data` equals to `data` of a previous `Q_ADD` event
    */
   XMPP_Q_DROP,
   /* `h` != 0
    * `data` equals to `data` of a previous `Q_ADD` event
    */
   XMPP_SM_Q_MOVE,
   /* `h` != 0
    * `data` equals to `data` of a previous `SM_Q_MOVE` event
    */
   XMPP_SM_Q_DROP,
};

typedef int (*xmpp_sm_callback)(xmpp_conn_t *conn, void *ctx, const char *id, enum xmpp_event ev, uint32_t h, const char *data);

int xmpp_sm_state_set_callback(xmpp_conn_t *conn, xmpp_sm_callback cb, void *ctx);
int xmpp_sm_state_restore(xmpp_conn_t *conn, enum xmpp_event ev, uint32_t h, const char *data);

The library calls the callback each time it changes the SM state with what happened. To restore, the user has to replay the sequence of all events that haven't been DROPped resp. the last value of UPDATE_H.

This would be bandwidth friendly, but handling the data and restoring is less user friendly.

The complexity of the handling&restore, even though it's not so super complex, brought me back to what you basically proposed. Handing over the complete state via the callback, but in a serialized form.

typedef int (*xmpp_sm_callback)(xmpp_conn_t *conn, void *ctx, const void *sm_state, size_t sm_state_len);

int xmpp_sm_state_set_callback(xmpp_conn_t *conn, xmpp_sm_callback cb, void *ctx);
int xmpp_sm_state_restore(xmpp_conn_t *conn, const void *sm_state, size_t sm_state_len);

The library calls the callback each time it changes the SM state passing over the serialized state. To restore the user has to pass the last state in.

What do you think?

Another question: what do you currently do with the "regular send queue"?

singpolyma commented 6 months ago

What do you think?

Since that kind of callback is exactly what I need to simulate on my side either way, it seems pretty much ideal to me if we can get it as you say.

Another question: what do you currently do with the "regular send queue"?

I'm not sure what you mean here. Is there a second queue in use that is not tied to SM state?

sjaeckel commented 6 months ago

What do you think?

Since that kind of callback is exactly what I need to simulate on my side either way, it seems pretty much ideal to me if we can get it as you say.

I guess the question should've been clearer then :D

What do you think of the two approaches? Could you live with the second one?

Another question: what do you currently do with the "regular send queue"?

I'm not sure what you mean here. Is there a second queue in use that is not tied to SM state?

Yes, there's the regular send queue which buffers stanzas before they get sent out.

xmpp_send*() puts the (rendered) stanza in the send queue
xmpp_run_once() tries to send the "next" element of the send queue
after a stanza has been sent completely it is put in the SM queue

There's a mechanism of stanza counting involved to determine what was received on the server side

we request the servers' h value by sending an <r>(equest) stanza after we put the first stanza into the send queue
we drop all stanzas that are marked as received by the server when receiving the <a>(nswer) which contains the servers' current h

You can retrieve the regular send queue elements via xmpp_conn_send_queue_drop_element(), which will pop them off the list. This is as it is, since the initial use case was not what you're intending to do, but a flaky connection that maybe is too slow to transmit all data and the user wants to remove elements from the send queue if it goes above a certain limit of queued stanzas.

FYI I've found two issues after re-reviewing the implementation. This only matters if you drop send queue elements and I will provide a fix soon.

singpolyma commented 6 months ago

What do you think of the two approaches? Could you live with the second one?

The second one being where there is an event that fires every time the SM state changes and it includes the serialized payload? Yes, this is ideal for me.

Yes, there's the regular send queue which buffers stanzas before they get sent out.

Hmm, that's unexpected. I would consider this part of the SM state really (on my side) since in the event of a reconnect these are still things that need to get re-tried (even if they were never tried once). So I guess if I save what you think of as SM state without this there will be a data loss race condition.

Would it be crazy from your side to include the contents of this queue in the new "state changed" event and restore it with the same restore call?

sjaeckel commented 6 months ago

Would it be crazy from your side to include the contents of this queue in the new "state changed" event and restore it with the same restore call?

No, I also think this should be included in the serialized data.

The serialized format could then look like this:

uint32           version
array_of_string  send_queue
uint32           sent
uint32           received
string           id
map_of_string    sm_queue

Where the sm_queue map uses the h value as key and the stanza as value.

Am I missing something?

singpolyma commented 6 months ago

I'm not sure why sm_queue needs to be a map? But I'm happy to encode it that way if it's better for you

sjaeckel commented 6 months ago

I'm not sure why sm_queue needs to be a map?

Because we need to keep track of the h<->stanza relation. In the resumed response from the server, when trying resumption, we receive the last h the server processed and then we have to drop all stanzas with smaller h that are maybe still in the queue, but have to re-send all stanzas with higher h.

sjaeckel commented 6 months ago

We could also calculate the h value from sent, that's true. I thought it's maybe easier&cleaner as proposed. If you think it's sufficient or maybe even better as an array, we can also do it like that.

singpolyma commented 6 months ago

I'm happy to do it as a map if that's easier. From outside I just get a blob and hand it back so doesn't matter too much to me

singpolyma commented 3 months ago

I have pushed a new draft based on our discussion, see if this matches what you thought

strophe / libstrophe

Allow serializing SM state #239