v1.5.1 CMSAttributes does not return a DER from dump() making message digest from a CMS confusing

sorensF5 commented 11 months ago

Problem

CMSAttributes when dump()'ed results in a binary block that does not follow the DER standard for a SET or SETOF as defined in ASN1 and DER and was wondering if this is intentional because it makes the message digest portion of the RFC harder to calculate. This seems to impact the version 1.5.1.

Expected Behavior

Taking the modulus of the CMSAttributes within a loaded CMS block and calculating the expected content to a CMS for verification should just take a cms_attributes.dump() rather than transforming the first byte to \x31 as follows:

from asn1crypto import cms
import hashlib
from oscrypto import asymmetric

with open("message.msg", "r") as source:
    message_smime = source.read()
loaded_cms = cms.ContentType.load(message_smime)
extracted_attributes = loaded_cms["content"]["signer_infos"][0]["signed_attrs"].dump()
digester_of_message = getattr(
    hashlib,
    loaded_cms["content"]["signer_infos"][0]["digest_algorithm"]["algorithm"],
    hashlib.sha256,
)
digested_message = digester_of_message(extracted_attributes).digest()
loaded_cert = asymmetric.load_certificate(loaded_cms["content"]["certificates"][0])
asymmetric.rsa_pkcs1v15_verify(
    loaded_cert,
    loaded_cms["content"]["signer_infos"][0]["signature"].contents,
    digested_message,
    "sha256",
)

Observed Behavior

When attempting the above, the signature is rejected with an exception until I add the step of:

--- diff_set_start  2023-12-18 10:01:09
+++ diff_set_end    2023-12-18 10:02:15
@@ -6,6 +6,7 @@
     message_smime = source.read()
 loaded_cms = cms.ContentType.load(message_smime)
 extracted_attributes = loaded_cms["content"]["signer_infos"][0]["signed_attrs"].dump()
+extracted_attributes = "\x31" + extracted_attributes[1:]
 digester_of_message = getattr(
     hashlib,
     loaded_cms["content"]["signer_infos"][0]["digest_algorithm"]["algorithm"],

This changes the tag given by CMSAttributes from \xa0\x81\xca0\x1c... to \x31\x81\xca0\x1c... - value truncated to keep this concise. Without this change, an error stating that the signature does not match is raised.

Request

Either a "fix" where the dump() responds with a proper DER for SET/SETOF or an answer to the question of why this behaving in this way?

Solution

Per comments below, the answer is to issue a .untag().dump() to the CMSAttributes object when extracting the DER. This presents the untagged form for signing that is required for the CMS per the calculations of the Message Digest per the statement:

Only the octets comprising the value of the eContent OCTET STRING are input to the message digest algorithm, not the tag or the length octets.

Thus:

 with open("message.msg", "r") as source:
     message_smime = source.read()
 loaded_cms = cms.ContentType.load(message_smime)
-extracted_attributes = loaded_cms["content"]["signer_infos"][0]["signed_attrs"].dump()
+extracted_attributes = loaded_cms["content"]["signer_infos"][0]["signed_attrs"].untag().dump()
 digester_of_message = getattr(
     hashlib,
     loaded_cms["content"]["signer_infos"][0]["digest_algorithm"]["algorithm"],
     hashlib.sha256,
 )

Becomes our solution.

Thanks to @MatthiasValvekens for the promptly-given solution!

MatthiasValvekens commented 11 months ago

The signed attrs field is a CONTEXT SENSITIVE tagged field, and .dump() returns the version with those tags. Digest calculation, on the other hand, requires a universally tagged payload. You can easily obtain one of those by first calling .untag() before .dump(). :)

sorensF5 commented 11 months ago

Thanks for the prompt reply, @MatthiasValvekens ! Will definitely try this out! Been side tracked with the workaround of adding the \x31 in place and have time today for the circle back.

Update/Resolution

Appears to work just fine with @MatthiasValvekens 's suggested:

extracted_attributes = loaded_cms["content"]["signer_infos"][0]["signed_attrs"].untag().dump()

Thanks for the information!

sorensF5 commented 11 months ago

Closing as this has an implemented solution built in that I simply missed that matches RFC statements:

Only the octets comprising the value of the eContent OCTET STRING are input to the message digest algorithm, not the tag or the length octets.

wbond / asn1crypto