rusticata / der-parser

BER/DER parser written in pure Rust. Fast, zero-copy, safe.
Apache License 2.0
85 stars 28 forks source link

Unable to get the exact form of the tag #25

Closed ndusart closed 4 years ago

ndusart commented 4 years ago

Hello,

We are using your library to parse ASN.1 files from EMV cards. It works pretty well but we have a problem for some tags.

In EMV standards, tags like 0x8F or 0x9F0F means different things while there are both a "application-specific not-structured tag of value 15". Then it is not possible to differentiate them using BerObject tag property.

Would it be possible to add a way to access the exact representation of the tag (as a byte slice for example) ? That would be very useful, at least for non-universal tags.

chifflier commented 4 years ago

Hmm, I did not suspect some protocols would assign different meanings for different encodings of the same value. If that's indeed used, I'm in favor of adding a way to get the value. The problem I see is that BerTag only contains the tag value, and changing it would break a lot of code.

One possible solution would be to add this to BerObjectHeader.the other one to add it to BerObject (or to both). This means adding a lifetime to the header. Also, maybe it needs to be stored in an Option or Cow type, if in the future serialization is added.

Any thoughts or preferred methods?

I'll experiment a bit around these ideas.

chifflier commented 4 years ago

Additionally, the BerObjectContent::ContextSpecific and ::Unknown variants may require to store the original tag

ndusart commented 4 years ago

Thanks for the response.

Personnally, having that in BerObject only would be sufficient (even under any sort of wrapper) if the addition of the lifetime parameter can potentially break some code and prevent adding it into the header.

But I really don't know if that's possible to implement this in BerObject without changing BerObjectHeader too as from_header_and_content would still need to construct a valid BerObject and I don't know if that's possible to reference the u32 of the BerTag as a &[u8].

A solution would be to copy the original tag in the BerObject but I understand it is not a great solution as everyone would have to pay for that copy even if they don't need to access the tag as their original value.

Maybe just a variant of parse_ber/parse_der which returns a BerResult<(&[u8], BerObject)> with the first element of the tuple referencing the original tag ?

It would not be available through parse_der_sequence_defined but (I don't use this API, so I presume that) it is used if we know the sequence of tags in advance, so the "truncation" of the tags value does not seem such a problem in this case.

chifflier commented 4 years ago

This is now partially implemented in b02b2df (another commit will follow for BerObject) by adding an optional raw_tag field.

In terms of use, this commit does not really change things (except it requires to add all raw tags to tests)

chifflier commented 4 years ago

Rest of implementation in 56aabdb

There may be some places where the raw tag is not kept, but it should currently work.

I intend to tag a version for an alpha release of der-parser 4.0 with the current features soon.

chifflier commented 4 years ago

der-parser 4.0.0-alpha1 has been published to crates.io, containing (amongst other changes) the ability to get the raw tag. Since the feature was added, I'm closing this issue, but since this is an alpha version please test it and open issues if you find anything (or would like to discuss the API),

sharksforarms commented 4 years ago

Hey guys, any reason why der_obj.tag.0.to_le_bytes() wouldn't be sufficient?

chifflier commented 4 years ago

Hey guys, any reason why der_obj.tag.0.to_le_bytes() wouldn't be sufficient?

if the encoding is canonical (like in DER), it should be equivalent. However, in BER, you can have different encodings for the same value (for ex adding leading 0). I did not imagine at first that any specs would be crazy enough to use that, but clearly I was too optimistic! So accessing the raw tag is the only way to get the original encoding.