jer: Pure hexadecimal JSON OCTET STRINGs

v0-e commented 2 months ago

As per X.697 25.3 2021, a JER OCTET STRING must be composed of only hexadecimal characters (inside quotation marks). This PR makes the JER decoder reject OCTET STRINGs which have whitespaces/similar characters. Tests for JER OCTET STRINGs are also added.

mouse07410 commented 2 months ago

Are you sure you want to tighten the decoding part? I'd be happy to make the encoder stricter in what it outputs - but doesn't this run against the Internet wisdom of "Be liberal wrt. what you receive, and conservative wrt. what you send"?

v0-e commented 2 months ago

I don't dislike that philosophy :) However the decoder was allowing stuff like newlines which are invalid in JSON strings, e.g.,

"abc
def"

Would you prefer:

allow only JSON and JER compliant encodings (e.g., strict hex chars, as in the current PR);
allow JSON-compliant and non-JER encodings (e.g., reject newlines \n and allow whitespaces );
allow non-JSON and non-JER encodings (e.g., allow any hex and non-hex char (4. or just allow the "whitespace" chars previously allowed before the proposed changes) in OCTET STRINGs and just parse the hex chars)

mouse07410 commented 2 months ago

However the decoder was allowing stuff like newlines which are invalid in JSON strings . . . . . . Would you prefer:

allow only JSON and JER compliant encodings (e.g., strict hex chars, as in the current PR);

allow JSON-compliant and non-JER encodings (e.g., reject newlines \n and allow whitespaces );

allow non-JSON and non-JER encodings (e.g., allow any hex and non-hex char

or just allow the "whitespace" chars previously allowed before the proposed changes) in OCTET STRINGs and just parse the hex chars

My main (only?) concern is breaking currently-working code by refusing to support what it currently consumes successfully.

Do we know how much of non-compliant encodings are floating around now?

From your list, the most tempting is (4), maybe with newlines allowed too in order to support breaking very long strings into "displayable"/"editable" shorter lines...

What is your opinion?

v0-e commented 2 months ago

My biggest concern would be a possible disparity between decoders leniency. For example asn1c is able to decode some JSON object but X commercial JER-decoder/Y open-source JER-decoder, or even a plain JSON validator fail to parse it.

Ultimately in this aspect, I guess it is a choice between using asn1c as a validator or as a "generous" decoder.

My main (only?) concern is breaking currently-working code by refusing to support what it currently consumes successfully.

Our JER decoder implementation is somewhat recent, probably there isn't much code using it :smile:.

Do we know how much of non-compliant encodings are floating around now?

I'd say there are more "JER encoders" out there than any other encoding rules. Given the simplicity of creating a JSON in your typical JS/TS environment, developers may be more willing to create their own (and non-compliant) JER encoder/decoder implementations. Plus considering this is a text-based encoding, there is a higher degree it can be incorrectly (manually) modified.

mouse07410 / asn1c

jer: Pure hexadecimal JSON OCTET STRINGs #193