well-typed / cborg

Binary serialisation in the CBOR format
https://hackage.haskell.org/package/cborg
191 stars 86 forks source link

fromFlatTerm impl of PeekTokenType is insufficently precise #324

Closed dcoutts closed 1 year ago

dcoutts commented 1 year ago

The implementation of PeekTokenType in fromFlatTerm only makes fairly coarse token type distinctions:

https://github.com/well-typed/cborg/blob/41498491994c709091b82b799a76a71295b3186e/cborg/src/Codec/CBOR/FlatTerm.hs#L564-L567

tokenTypeOf (TkInt n)
    | n >= 0                = TypeUInt
    | otherwise             = TypeNInt
tokenTypeOf TkInteger{}     = TypeInteger

In particular, it doesn't ever us TypeUInt64 or TypeNInt64:

data TokenType
  = TypeUInt
  | TypeUInt64
  | TypeNInt
  | TypeNInt64
  | TypeInteger

This creates a problem for some decoders that use peekTokenType and only look for the Type(U|N)Int(64) tokens and do not consider TypeInteger. Such decoders will not correctly round-trip via FlatTerm, where the will correctly round-trip via the binary CBOR representation.

The solution is to return TypeUInt64 and TypeNInt64 when the integer fits within the range, e.g.

tokenTypeOf (TkInteger n)
  | n >= 0 && n <  2^64 = TypeUInt64
  | n <  0 && n > -2^64 = TypeNInt64
  | otherwise           = TypeInteger

Alternatively, we could change the representation of FlatTerm to preserve more token type information, e.g. by splitting TkInt into TkUInt, TkNInt, TkUInt64, TkNInt64.