pre-srfi / variable-length

Variable-length integers and strings
0 stars 0 forks source link

ASN.1 DER and BER integer encodings #1

Open johnwcowan opened 3 years ago

johnwcowan commented 3 years ago

[DER encoding]

The general scheme is this:

Type byte: 01

Length bytes (short form): 00-7F is the length of the following number in bytes.

Length bytes (long form): 82-88 specifies the length of the length (offset 128), immediately followed by the length as a big-endian byte sequence. The smallest meta-length possible is used. (It's possible to go past 88, but that represents an integer >= 2^(8(2^63)) or 2^73786976294838206464, which is not even storable*.)

Data bytes: The signed numeric value as a big-endian byte sequence.

Note that you can read big-endian non-negative numbers by folding over them: you multiply the seed (initially 0) by 256 and then add the next byte.

lassik commented 3 years ago

Here's some old ASN.1 BER code from Alex:

(define (read-ber-integer . opt)
  (let-params* opt ((port (current-input-port)))
    (let loop ((acc 0))
      (let ((byte (read-byte port)))
        (cond
          ((eof-object? byte) byte)   ; fail on eof
          ((< byte 128) (+ acc byte)) ; final byte is < 128
          (else
           (loop (arithmetic-shift (+ acc (bitwise-and byte 127)) 7))))))))

(define (write-ber-integer number . opt)
  (assert (integer? number) (not (negative? number)))
  (let-params* opt ((port (current-output-port)))
    (let loop ((n (arithmetic-shift number -7))
               (ls (list (bitwise-and number 127))))
      (if (zero? n)
        (for-each (lambda (b) (write-byte b port)) ls)
        (loop (arithmetic-shift n -7)
              (cons (+ 128 (bitwise-and n 127)) ls))))))

That looks like a big-endian version of the Google/Go/Protobuf varints that I implemented for this pre-SRFI.

lassik commented 3 years ago

The thing I like about the Go varints is that they are simpler both conceptually (no length bytes, just a run of bytes with the high bit set) and implementation-wise (they are little-endian, so we don't need to figure out the digits backwards before writing them).

However, ASN.1 (and the other things listed at https://en.wikipedia.org/wiki/Variable-length_quantity) are also in use in the wild, which vouches for them.

The current implementation of this SRFI is almost 200 lines, which is getting to be a bit suspect to my taste considering how simple its job is (in layman's terms - it just passes some integers and strings). How to figure out the right scope?

johnwcowan commented 3 years ago

The current implementation of this SRFI is almost 200 lines, which is getting to be a bit suspect to my taste considering how simple its job is (in layman's terms - it just passes some integers and strings). How to figure out the right scope?

I think we have two scope choices:

  1. Provide one way to do it and expect that everyone will use it unless they need compatibility. In that case the Google varints and varstrings are simplest.

  2. Provide several approaches that are already in use, as we do for hexstring and base64 in SRFI 207. We are already specifying varint, signed varint, varstring, and netstring. Adding ASN.1 int and string don't grow it that much conceptually. (Strings have the same format except that the type byte is

I'm pretty sure Alex's code is a result of using (by mistake or on purpose) the length coding instead of the complete integer coding.

lassik commented 3 years ago

Is the DER encoding you talk about in the topmost post the one we need for the core ASN.1 subset you recommend as the schemepersist binary interchange format?

johnwcowan commented 3 years ago

DER always uses byte counts even for complex objects, so I abandoned it in favor of "DER for byte objects, CER (00 00 termination) for compound objects".

By the way, I changed the name of the textual/binary encoding from Arentheis to Twinjo. You can pronounce the J however you like.