ubjson / universal-binary-json

Community workspace for the Universal Binary JSON Specification.
115 stars 12 forks source link

Strings seem to have a length defined by [i] instead of [u] #99

Open SeanSolberg opened 5 years ago

SeanSolberg commented 5 years ago

When writing a string, a lot of the examples show something like the following [S][i][5][hello] However, [i] means that the following byte represents a signed byte number which can be from -128 to 127. What does it mean when a negative value is sent? Is this supposed to be interpreted as a [U] which is an unsigned byte? Can we use [U] if the length of the string is from 0 to 255? Or must we use [I] (16bit) for lengths that are from 128 to 255? I realize there are no "unsigned" numbers bigger than the one unsigned number defined by [U], but it does make sense that a negative number should never be used for a string length. Bottom line is can we use [U] for the string length byte. It makes sense to do so but the spec is not clear on this issue.

rkalla commented 5 years ago

This is a good callout and falls into the category of 'undefined behavior' which is good and bad.

int8 has a uint8 counterpart, but int16/32/64 do not so declaring that only "positive values" would be supported wasn't going to be possible.

The [i] market isn't specific to the [S] type; [S] allows any integer type to follow it and in that flexibility allows potential nonsense like a negative value. At face value this is probably nonsensical but left in intentionally to allow for some flexibility in defined customized behavior for certain impls of UBJSON - for example, using the negative numbers to signal information or error cases.

The logical side of my brain hates this fuzziness but the practical side of brain is like "Meh, smarter people than me will find something interesting to do with this flexibility".

At least that is what I tell myself so I can sleep at night.

On Fri, Oct 26, 2018 at 3:04 PM Sean Solberg notifications@github.com wrote:

When writing a string, a lot of the examples show something like the following [S][i][5][hello] However, [i] means that the following byte represents a signed byte number which can be from -128 to 127. What does it mean when a negative value is sent? Is this supposed to be interpreted as a [U] which is an unsigned byte? Can we use [U] if the length of the string is from 0 to 255? Or must we use [I] (16bit) for lengths that are from 128 to 255? I realize there are no "unsigned" numbers bigger than the one unsigned number defined by [U], but it does make sense that a negative number should never be used for a string length. Bottom line is can we use [U] for the string length byte. It makes sense to do so but the spec is not clear on this issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ubjson/universal-binary-json/issues/99, or mute the thread https://github.com/notifications/unsubscribe-auth/AARteDyhtfEiGJo0LHpHzoNrCzI4tEPYks5uo4bQgaJpZM4X9CX5 .

SeanSolberg commented 5 years ago

Thanks for the response. Should/can we be using [U] if possible? The spec isn't clear about that since it uses the term "int num val". Does that mean only the signed int options are possible for the string length or can we use unsigned int8 ([U] type) for the string size? It would save space for strings that are 128 to 255 bytes in size if we are allowed to use [U]. I think the spec needs to be tweaked a bit to be specific about this. My fear is that I will produce [U] but other systems will not code to read an [U] and then we will have compatibility problems. The reason I think this problem will likely happen is because all the examples use [i] so I can see people assuming that they need not write their readers to support [U] for string sizes.

..Sean

..Sean

On Sat, Oct 27, 2018 at 10:18 AM Riyad Kalla notifications@github.com wrote:

This is a good callout and falls into the category of 'undefined behavior' which is good and bad.

int8 has a uint8 counterpart, but int16/32/64 do not so declaring that only "positive values" would be supported wasn't going to be possible.

The [i] market isn't specific to the [S] type; [S] allows any integer type to follow it and in that flexibility allows potential nonsense like a negative value. At face value this is probably nonsensical but left in intentionally to allow for some flexibility in defined customized behavior for certain impls of UBJSON - for example, using the negative numbers to signal information or error cases.

The logical side of my brain hates this fuzziness but the practical side of brain is like "Meh, smarter people than me will find something interesting to do with this flexibility".

At least that is what I tell myself so I can sleep at night.

On Fri, Oct 26, 2018 at 3:04 PM Sean Solberg notifications@github.com wrote:

When writing a string, a lot of the examples show something like the following [S][i][5][hello] However, [i] means that the following byte represents a signed byte number which can be from -128 to 127. What does it mean when a negative value is sent? Is this supposed to be interpreted as a [U] which is an unsigned byte? Can we use [U] if the length of the string is from 0 to 255? Or must we use [I] (16bit) for lengths that are from 128 to 255? I realize there are no "unsigned" numbers bigger than the one unsigned number defined by [U], but it does make sense that a negative number should never be used for a string length. Bottom line is can we use [U] for the string length byte. It makes sense to do so but the spec is not clear on this issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ubjson/universal-binary-json/issues/99, or mute the thread < https://github.com/notifications/unsubscribe-auth/AARteDyhtfEiGJo0LHpHzoNrCzI4tEPYks5uo4bQgaJpZM4X9CX5

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ubjson/universal-binary-json/issues/99#issuecomment-433629434, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQ1HpxIMb8sy2qOM0-ETNilrmhu0UbPks5upHk-gaJpZM4X9CX5 .

rkalla commented 5 years ago

You are free to use any integer type - I tend to agree with you that is UBJ had unsigned variants of all sizes - requiring them for S would be a good direction.

But we don’t - and so it doesn’t.

On Sat, Oct 27, 2018 at 11:54 AM Sean Solberg notifications@github.com wrote:

Thanks for the response. Should/can we be using [U] if possible? The spec isn't clear about that since it uses the term "int num val". Does that mean only the signed int options are possible for the string length or can we use unsigned int8 ([U] type) for the string size? It would save space for strings that are 128 to 255 bytes in size if we are allowed to use [U]. I think the spec needs to be tweaked a bit to be specific about this. My fear is that I will produce [U] but other systems will not code to read an [U] and then we will have compatibility problems. The reason I think this problem will likely happen is because all the examples use [i] so I can see people assuming that they need not write their readers to support [U] for string sizes.

..Sean

..Sean

On Sat, Oct 27, 2018 at 10:18 AM Riyad Kalla notifications@github.com wrote:

This is a good callout and falls into the category of 'undefined behavior' which is good and bad.

int8 has a uint8 counterpart, but int16/32/64 do not so declaring that only "positive values" would be supported wasn't going to be possible.

The [i] market isn't specific to the [S] type; [S] allows any integer type to follow it and in that flexibility allows potential nonsense like a negative value. At face value this is probably nonsensical but left in intentionally to allow for some flexibility in defined customized behavior for certain impls of UBJSON - for example, using the negative numbers to signal information or error cases.

The logical side of my brain hates this fuzziness but the practical side of brain is like "Meh, smarter people than me will find something interesting to do with this flexibility".

At least that is what I tell myself so I can sleep at night.

On Fri, Oct 26, 2018 at 3:04 PM Sean Solberg notifications@github.com wrote:

When writing a string, a lot of the examples show something like the following [S][i][5][hello] However, [i] means that the following byte represents a signed byte number which can be from -128 to 127. What does it mean when a negative value is sent? Is this supposed to be interpreted as a [U] which is an unsigned byte? Can we use [U] if the length of the string is from 0 to 255? Or must we use [I] (16bit) for lengths that are from 128 to 255? I realize there are no "unsigned" numbers bigger than the one unsigned number defined by [U], but it does make sense that a negative number should never be used for a string length. Bottom line is can we use [U] for the string length byte. It makes sense to do so but the spec is not clear on this issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ubjson/universal-binary-json/issues/99, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AARteDyhtfEiGJo0LHpHzoNrCzI4tEPYks5uo4bQgaJpZM4X9CX5

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/ubjson/universal-binary-json/issues/99#issuecomment-433629434 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ACQ1HpxIMb8sy2qOM0-ETNilrmhu0UbPks5upHk-gaJpZM4X9CX5

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ubjson/universal-binary-json/issues/99#issuecomment-433646145, or mute the thread https://github.com/notifications/unsubscribe-auth/AARteCJ22L7c8Mug_1U_IJX_bPda-IARks5upKvsgaJpZM4X9CX5 .

shelacek commented 5 years ago

Hello, the spec says 1-byte + int num val + string byte len for high-precision number and strings. For optimized containers the spec defines count as Integer Numeric Value. As far as I understand it, it means any of int8, uint8, int16, int32, int64. Signed/unsigned integers are still integers so also [U] is allowed and I believe that "spec compilant" decoder MUST accept [U].

I currently throw an exception in case of negative length/count in my decoder.