rusticata / der-parser

BER/DER parser written in pure Rust. Fast, zero-copy, safe.
Apache License 2.0
85 stars 28 forks source link

`BerObject::as_pretty()` prints universal tag names for non-universal classes #54

Closed lilyball closed 2 years ago

lilyball commented 2 years ago

Not a Contribution

BerObject::as_pretty() prints universal tag names when printing BerObjectContent::Tagged and BerObjectContent::Unknown contents. As of 5f76d92762770893dcc901e4fa1f9a105f6c346d (see #47) it includes the class name for BerObjectContent::Unknown, but it didn't change anything about how tags are printed.

Examples

DER: [80, 4, de, ad, be, ef]
Output: Unknown(ContextSpecific,EndOfContent,[de, ad, be, ef])

DER: [42, 4, de, ad, be, ef]
Output: Unknown(Application,Integer,[de, ad, be, ef])

If I parse tagged content, I get similarly busted output:

-- Type1 ::= VisibleString
-- Type2 ::= [APPLICATION 3] IMPLICIT Type1
-- Type3 ::= [APPLICATION 2] Type2
-- Type4 ::= [1] Type3
DER: [a1, 9, 62, 7, 43, 5, 4a, 6f, 6e, 65, 73]
Output: ContextSpecific [CONTEXT-SPECIFIC Boolean] {
  ContextSpecific [APPLICATION Integer] {
    VisibleString("Jones")
  }
}

There's two issues here. The first is of course that it's printing universal tag names for non-universal classes. The second is that it always prints ContextSpecific for tagged objects even for other classes.

Parsing private tags is also curious:

DER: [c3, 4, de, ad, be, ef]
Output: Private(c:PRIVATE s:0 t:3): [de ad be ef]

It's no longer trying to print the tag name, but instead it prints "Private" twice with this odd formatting. It also puts the bytes outside of the parens, whereas Unknown puts it inside.

Expected behavior

I would expect BerObjectContent::Unknown and BerObjectContent::Private to print similarly. For that matter, I don't even know why Private gets its own variant whereas ContextSpecific and Application share an "Unknown" one. I would have expected either all three to get their own variants, or all three to share a variant (the latter being simpler of course).

In any case, I would have expected something like Unknown(ContextSpecific(0),[de, ad, be, ef]) and Unknown(Application(2),[de, ad, be, ef]) (note how the tag name isn't interpreted), or possibly instead formatting similar to Private. And for tagged vales, I might expect it to say something like Tagged [CONTEXT-SPECIFIC 1] instead.

Musings on underlying issues

I think the core issue here is really the fact that tags and classes are separate, and so BerTag(2) prints automatically as Integer regardless of the class. I would expect this to be a problem for parsing too. For example, my parser for the tagged content looks like

parse_der_tagged_explicit(1, |i| {
    parse_der_tagged_explicit(2, |i| {
        parse_der_tagged_implicit(3, parse_der_content(DerTag::VisibleString))(i)
    })(i)
})

Notice here how my parser doesn't mention classes anywhere. It parsed [a1, 9, 62, 7, 43, 5, 4a, 6f, 6e, 65, 73], but it would have happily parsed [61, 9, a2, 7, 43, 5, 4a, 6f, 6e, 65, 73] as well despite the fact that the change in class means the tags might have very different meanings.

I really think this needs to be changed such that the class and tag are a single value. This also fixes the naming issue, as ASN.1 uses "tag" to mean "class and number". So we can use something like the following:

pub struct BerTag(pub BerClass, pub u32);

It can then have a Display impl that prints like "{class}({number})" (Debug can just be derived). (EDIT: Or maybe it should print like [{class} {number}], optionally printing ContextSpecific like [{number}] to match ASN.1 notation for tagged types.)

This is of course a rather significant change to the crate.

Side note about the parser

If you look at my tagged parser above, you'll notice the use of |i| parse_fn(…)(i). This feels redundant, but it's necessary right now because parse_der_tagged_explicit (and similar functions) take a F: Fn but return an impl FnMut. This means they can't be nested without the closure wrappers. My expectation is they could be modified to take F: FnMut rather trivially (just by adding a mut keyword) and the fact that they don't is likely an oversight due to the fact that none of the built-in parsers use them (and the doc examples don't nest them).

lilyball commented 2 years ago

Regarding BerTag display, I forgot about the fact that universal tags will likely want to be printed as their name, like they do today. So the display impl will likely want to turn BerTag(BerClass::Universal, 2) into Integer as it does now, but do the ASN.1 notation-like printing for other classes, and for unknown universal tags.

Also, for pretty-printing BerObjectContent::Tagged, universal tags will want to be printed as e.g. [UNIVERSAL 2] instead of as Integer, because if it's explicitly-tagged then the tag is no longer expected to have the same meaning (not that we should be seeing universal explicitly-tagged values).

lilyball commented 2 years ago

One more thought: pretty-printing doesn't show implicitly-tagged values. I don't know if we really care, but maybe it's worth showing for non-universal classes? For the above example, this would make the output look something like the following:

[CONTEXT-SPECIFIC 1] {
  [APPLICATION 2] {
    [APPLICATION 3] IMPLICIT VisibleString("Jones")
  }
}
lilyball commented 2 years ago

Another quirk I noticed: Unknown prints data with {:x?} and so it shows like [1, a, 14, 1e]. Private prints data with a debug::HexSlice helper which prints the same data like [01 0a 14 1e]. I think the latter format is better here, and so should be used for both.

lilyball commented 2 years ago

Regarding the BerObjectContent::Unknown vs BerObjectContent::Private split, I also don't know why Private carries a copy of the BerObjectHeader wheras Unknown carries a copy of the class and tag. Carrying the class and tag seems primarily useful in the case of an implicitly-tagged value where the tag used for the content is not present in the header, but I don't know what use the rest of the header has for interpreting the contents. Surely the structured and len fields only matter for the DER/BER encoding and not the interpretation of the data. I don't know enough about BER to know how raw_tag applies, but that shouldn't matter for implicitly-tagged values.

My recommendation here is to merge these two together into a single BerObjectContent::Unknown(BerClass, BerTag, &'a [u8]) type. So basically, just remove the special handling of BerClass::Private from the parsing. The class/tag pair are useful for interpreting implicitly-tagged contents, and the rest can be read from the header if needed.

chifflier commented 2 years ago

Hi, All the raised issues should be now fixed in the merged PR #60 and the referenced commits above, so I'm closing this bug. If there is anything still present or not fully addressed, please reopen issues. Thanks!