Closed rickb777 closed 11 years ago
Rick, any ambiguity in the spec should be eradicated with prejudice :)
Can you let me know where in the Type Reference (updated section of the spec) the ambiguity is around HIGH PRECISION (aka "HUGE") is? http://ubjson.org/type-reference/value-types/#string
As to the suggestion of Base-36, certainly appealing that it is more compact, but I would reject it for two reasons:
Let me know if I missed something!
The ambiguity is simply not being clear whether the string is a base-ten representation of the decimal value of the huge number, although this is possibly what might be assumed. It is also unclear even whether the 'huge' is a decimal number that allows a fractional part and/or an exponent part, or just a big integer.
The obvious assumption would be that 'huge' is like Java's BigDecimal class (an unlimited precision decimal number) converted to a decimal string and encoded as a UTF8 byte sequence.
However, it would be equally valid to assume that 'huge' is instead like Java's BigInteger (an unlimited size integer) converted to a decimal string and encoded as a UTF8 byte sequence.
Unfortunately, representing 'huge' as a string means a significant expansion in the space required over the binary form. For example, a 32bit integer takes up to ten ascii characters - an expansion from 4 to 10 bytes.
So this could be mitigated by supporting both huge and a new intAny. An example use-case for intAny might be cryptography, and the representation would be
Meanwhile, hugeDecimal would simply be the decimal (i.e. base-10) representation with an optional fraction and/or exponent.
(Aside - the base36 suggestion is rather tongue in cheek - good idea perhaps but not really what people expect. It's very easy to do in Java, for a bit of fun!)
So to summarise my suggestion,
Base-36 is just a string codec. Why not to use something better, like lzma? (:
To be serious, base-36 is not acceptable since HIPREC value should follow JSON number type specification which allows to have values like -1.93+E190
and I don't feel that it's rationale to apply additional transformations for them.
Base36 was a joke, man!
WONTFIX
@rickb777 I understand your point, but I believe this to be an optimization for a very small corner case for the UBJSON specification. Use of high precision values (aka "huge") is expected to be very infrequent. As you pointed out, there are more optimized way to store these values, but I don't want to clutter the spec for a 3% use-case optimization.
As for the clarification of the specification itself, the spec for HIGH PRECISION literally says that the format follows the JSON's spec requirement for the number type -- whatever JSON dictates, we dictate here - ambiguity might be an unfortunate side effect of this (e.g. base36 vs base10 -- using your previous joke) but I am not going to try and reduce that scope in UBJSON spec.
'huge' would appear to be a base-10 representation of the number as a string.
Being awkward, I might prefer base-36 because it is more compact and thereby produce a spec-compliant but incompatible implementation.
The representation of 'huge' needs to be less ambiguous.