laurencelundblade / QCBOR

Comprehensive, powerful, commercial-quality CBOR encoder/ decoder that is still suited for small devices.
Other
181 stars 47 forks source link

Provide public API to determine decoder item offsets #213

Closed BrianSipos closed 3 months ago

BrianSipos commented 5 months ago

I would like to be able to extract the encoded form of a CBOR item from within a (possibly deep) structure. One way to do this is to determine the offset into the input buffer before and after recursively consuming the item. Unfortunately, there is currently no way to inspect the UsefulInputBuf from within a QCBORDecodeContext or to tell the cursor position directly from a QCBORDecodeContext.

For example, in the structure [1, 2, {3: "hi", 4: 'oh'}, 5] after reading the first two array items, I want to extract the encoded string A20362686904426F68 representing the third item.

A pseudocode would be something like the following; where inbuf was visible from the decoder or there was a _Tell() function for the decoder itself.

QCBORDecodeContext decoder;
...
size_t start = UsefulInputBuf_Tell(inbuf);
QCBORDecode_VGetNextConsume(&decoder);
size_t end = UsefulInputBuf_Tell(inbuf);

from which the range [start, end) could be extracted from the input data.

laurencelundblade commented 5 months ago

Hi Brian,

Maybe take a look at #117? I know it is not working, but I think it might do what you want. If it does, I might get back to work on it soon. Had another similar request a few days ago too.

Offsets might be workable if it is only to Tell() and pulling out ranges, but Seek() is a very different matter because that means recomputing the internal nesting tracking state, which is complicated. You only want Tell(), right?

BrianSipos commented 5 months ago

Yes, just exposing a Tell interface with an appropriate explanation of what the offset represents (the position of the start of the next item, break, or EOF depending on what's already been decoded) is enough to be able to use the offset to extract encoded bytes from the original data. A simple Tell would allow the user to decide whether to extract one item, a sequence, etc.

BrianSipos commented 5 months ago

Having a Tell interface for the streaming encoder would also be valuable for certain uses. Specifically, there are CRC byte-string items as part of RFC 9171 block encodings that would be useful to manipulate in-place after encoding a placeholder (zero values) byte-string. Having similar encoder Tell function would allow generating CRC values and replacing byte-string contents without needing to re-encode all of the preceding items (because the items haven't changed, only the CRC value).

laurencelundblade commented 5 months ago

OK. Understood. Seems reasonable and not too hard. Will look at it in the next few weeks ( have a queue of stuff...)

laurencelundblade commented 5 months ago

Also, create_tbs_hash() in https://github.com/laurencelundblade/t_cose/blob/master/src/t_cose_util.c may be worth a look. It uses incremental hashing to create the hash of a potentially large CBOR structure without duplicating it in memory. Maybe there is a parallel with what you are doing (or maybe not).

Appreciate your request. Helps to make QCBOR better.

laurencelundblade commented 3 months ago

Fixed by #223