CMS Enveloped Data creation failure

balena commented 8 years ago

I am trying to create a CMS Enveloped Data using asn1crypto as a substitution of the Google's certificate-transparency source code I forked on my project https://github.com/balena/python-smime. I had the idea of using asn1crypto after having some trouble when parsing indefinite-length attributes generated by certain E-Mail managers (like Thunderbird).

While asn1crypto is great for parsing all kinds of CMS data, I am getting some trouble when generating them. I have an unit test where the PKCS7 output is passed as input to command-line OpenSSL in order to make sure the implementation is compatible. But all I get is an error from OpenSSL when parsing ASN.1 tags (I don't know exactly which one).

There is a single function responsible for returning a CMS ContentInfo structure as follows:

def __get_enveloped_data(pubkey_cipher, sym_cipher, x509_cert,
                         encrypted_key, iv, encrypted_content):
    return cms.ContentInfo({
        'contentType': cms.ContentType(u'enveloped_data'),
        'content': cms.EnvelopedData({
            'version': u'v0',
            'recipient_infos': cms.RecipientInfos([
                cms.RecipientInfo(
                    name='ktri',
                    value=cms.KeyTransRecipientInfo({
                        'version': u'v0',
                        'rid': cms.RecipientIdentifier(
                            name='issuer_and_serial_number',
                            value=__get_issuer_and_serial_number(x509_cert)
                        ),
                        'key_encryption_algorithm': cms.KeyEncryptionAlgorithm({
                            'algorithm': pubkey_cipher.oid,
                            'parameters': core.Null()
                        }),
                        'encrypted_key': encrypted_key
                    })
                )
            ]),
            'encrypted_content_info': cms.EncryptedContentInfo({
                'content_type': cms.ContentType(u'data'),
                'content_encryption_algorithm': cms.EncryptionAlgorithm({
                    'algorithm': sym_cipher.oid,
                    'parameters': iv
                }),
                'encrypted_content': cms.OctetString(
                    encrypted_content, tag=0, tag_type='implicit')
            })
        }, tag=0, tag_type='explicit')
    })

The object returned by the above function is encoded using:

encoded_content = __encode_in_base64(enveloped_data.dump())

And the __encode_in_base64 is just a pretty printer function to convert the DER output into BASE64.

When I execute the OpenSSL function:

$ openssl smime -decrypt -in tmp -inkey private_key.pem

All I get is the following:

Error reading S/MIME message
140702704453280:error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong tag:tasn_dec.c:1338:
140702704453280:error:0D06C03A:asn1 encoding routines:ASN1_D2I_EX_PRIMITIVE:nested asn1 error:tasn_dec.c:852:
140702704453280:error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error:tasn_dec.c:772:Field=type, Type=PKCS7
140702704453280:error:0D0D106E:asn1 encoding routines:B64_READ_ASN1:decode error:asn_mime.c:193:
140702704453280:error:0D0D40CB:asn1 encoding routines:SMIME_read_ASN1:asn1 parse error:asn_mime.c:528:

Do you have any idea what could be the problem? Is there any caveat when encoding asn1crypto structures like the above?

wbond commented 8 years ago

By default openssl smime expects a MIME message. I would try adding -inform der and using the raw data from asn1crypto's dump() method. That should rule out any issue with the DER data being wrapped.

wbond commented 8 years ago

Also, you shouldn't need to add the tag or tag_type kwargs since these are known structures, although I am not at my computer right now to test.

balena commented 8 years ago

Dear Will,

Actually the tag and and tag_type parameters were added as an attempt to overcome the OpenSSL parsing issue. If I remove them, the OpenSSL error message is still the same.

And not, -inform does not make any sense for the command openssl smime as the file tmp indicated in the command line has to be in MIME-1.0 format, as follows:

To: "Alice" <alice@foo.com>
From: "Bob" <bob@bar.com>
Subject: A message from python
MIME-Version: 1.0
Content-Type: application/pkcs7-mime; smime-type=enveloped-data; name=smime.p7m
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=smime.p7m

MII...base64-encoded-CMS-ContentInfo...

wbond commented 8 years ago

The man page for openssl mime says you can pass a raw, DER-encoded PKCS#7 structure using openssl -inform der -in tmp. This would at least rule out any encoding issues adding the problem, although my guess is that is unlikely.

It is certainly possible there is an error in one of the CMS structure definitions. I've used some via oscrypto because .p12 files use PKCS#7 to encrypt key data and certificates, but I'm sure there are some variations that haven't been hit before.

What may be helpful in debugging is to pass a known good file into asn1crypto and use the .debug() function to dump the complete structure and all of the encoding details. Comparing this to what is being constructed in Python would probably be one of the fastest paths to figuring it out.

wbond commented 8 years ago

Actually, looking at your traceback more closely and making an educated guess:

Error reading S/MIME message
140702704453280:error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong tag:tasn_dec.c:1338:
140702704453280:error:0D06C03A:asn1 encoding routines:ASN1_D2I_EX_PRIMITIVE:nested asn1 error:tasn_dec.c:852:
140702704453280:error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error:tasn_dec.c:772:Field=type, Type=PKCS7
140702704453280:error:0D0D106E:asn1 encoding routines:B64_READ_ASN1:decode error:asn_mime.c:193:
140702704453280:error:0D0D40CB:asn1 encoding routines:SMIME_read_ASN1:asn1 parse error:asn_mime.c:528:

I think it is having trouble decoding the type field of a PKCS7 structure. That is likely the content_type field of cms.ContentInfo(). This would be the first field of the first structure, which makes me think something is funky in terms of one of the encoding steps after DER.

wbond commented 8 years ago

The only other comment is that once you figure out what is going on here, you can omit many of the explicit class constructors in constructing the cms.ContentInfo() object. If you pass nested dict/list objects with appropriate keys, it will figure it out from there since it already has all of the expected field types encoded.

balena commented 8 years ago

Dear William,

Thanks for your suggestions, and you were correct: the problem was indeed in the ContentInfo field with wrong name ("contentType" instead of "content_type").

Some improvements would make implementors' life easier:

In core.py:424, the debug function uses %s. When debugging CMS output, it often raises an exception when trying to print binary values contained in OctetStrings (such as in EncryptedContentInfo, field encrypted_content). I suggest you to use %r instead.
The Choice implementation does not permit a "pythonistic" construction (dict/list). As fair as I have understood, it requires always explicit class constructor with name and value parameters. As suggestion, you may check if the name is a dict, if so, it should have a single attribute, which the key corresponds to the name parameter and the value corresponds to the value parameter.
When the user passes a name which does not name any of the specified fields, the library should raise an exception. Reason: I suppose the problem with wrong names (typos) will be usual when converting code from pyasn1, or when reading directly from ASN.1 specifications.

wbond commented 8 years ago

Can you provide an example backtrace so I can understand the exception?
The Choice ASN.1 construct is strange because it allows any number of different values to be represented as long as they have distinct tags, so the parsing side can disambiguate. To construct one "pythonically", you would use a dict with name and value keys: Choice({'name': 'the_choice_name', 'value': {'field': 'val'}}).
Yes, this is a side effect of some of the overloading of different classes, and I agree it should be fixed to prevent typos in this way. One of the most frustrating things is to try to debug someone else's code that uses **kwargs, so I will work on improving that.

balena commented 8 years ago

This is the stack trace:

Traceback (most recent call last):
  File "asn1cmp.py", line 58, in <module>
    d1.debug()
  File "/usr/local/lib/python2.7/dist-packages/asn1crypto/core.py", line 3289, in debug
    child.debug(nest_level + 3)
  File "/usr/local/lib/python2.7/dist-packages/asn1crypto/core.py", line 3289, in debug
    child.debug(nest_level + 3)
  File "/usr/local/lib/python2.7/dist-packages/asn1crypto/core.py", line 3747, in debug
    child.debug(nest_level + 1)
  File "/usr/local/lib/python2.7/dist-packages/asn1crypto/core.py", line 422, in debug
    self.chosen.debug(nest_level + 2)
  File "/usr/local/lib/python2.7/dist-packages/asn1crypto/core.py", line 3289, in debug
    child.debug(nest_level + 3)
  File "/usr/local/lib/python2.7/dist-packages/asn1crypto/core.py", line 424, in debug
    print('%s    Native: %s' % (prefix, self.native))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: ordinal not in range(128)

The above happens for an EncryptedContentInfo, with "encrypted_key" containing 0x94264288c68bb67820ff82572a4198d168083a04a7cee486d8327dfd8de129aa0f74713f1fbc85b2d0b7d8b761c33071aa7e6a23a4caa8cc6a54f4dd5502d1704e1aff87608dc70599bf685cf1a2af98397016b39c6519ead9c1339cf63e806252b8a97be7ced479490f13063973cad9b857bce0f931ab9c654d401370e3b130 (this is not unicode neither ascii; this is an encrypted content).

Instead of:

return Container({"something": WhateverChoice(name="foo", value="bar")})

the same would accept:

return Container({"something": {"foo": "bar"}})

Poddster commented 8 years ago

Hi,

I'm having a similar issue. OctetString doesn't like to be .debug()'d. Here's a simplified example.

def test_octect_string():
        import asn1crypto.core
        from binascii import unhexlify

        der_octetstring = unhexlify("0402CCE9")
        a = asn1crypto.core.OctetString.load(der_octetstring)

        print("!")
        print(a.dump())
        #print(a.debug())

        b = asn1crypto.core.OctetString(unhexlify("CCE9"))
        assert b.dump() == a.dump()
        #b.debug()

This code works fine as-is, but if you uncomment any of the .debug calls then it explodes with:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 2: ordinal not in range(128)

Now, I'm not sure if I'm using OctentString here correctly, but unfortunately I'm working with an ASN.1 schema and data source that I don't control, and it's using it like this:

KeyBlock ::= Sequence
{
    KeyBlock version = 1 (INTEGER)
    Keys ::= Set
    {
        KeyInfo ::= Sequence
        {
            TR31Key = (PRINTABLESTRING)
            keyType = (INTEGER)
            ksn = (OCTET STRING)
            keySlot = (INTEGER)
            keyName = (PRINTABLESTRING)
            keyCheck = (OCTET STRING)
        }
    }
}

with 0xCCE9 for keyCheck, which causes problems on debug().

(Though this guide says OctectStrings are any 8bit data values, not necessarily ascii-printable ones)

wbond commented 8 years ago

My guess is that you are using Python 2 and your stdout is ASCII or something of the like. A full backtrace would help in confirming (I'm away from my computer for the week).

Poddster commented 8 years ago

Yes, unfortunately this project is stuck using python2 :(

I didn't post the backtrace before as it's not much use, and the faulting line is the same as @balena found.

Using the code from my last post (minus the function definition) results in:

$ python2 temp.py 
!
��
  asn1crypto.core.OctetString Object #139984550301200
    Header: 0x0402
      primitive universal tag 4
    Data: 0xcce9
Traceback (most recent call last):
  File "temp.py", line 9, in <module>
    print(a.debug())
  File "/usr/local/lib/python2.7/dist-packages/asn1crypto/core.py", line 424, in debug
    print('%s    Native: %s' % (prefix, self.native))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 0: ordinal not in range(128)

(ps: I've also now seen this issue when using .debug() on a certificate file originating in the same place as this OctetString )

wbond commented 8 years ago

As of afd3cac4b33d56aaf8ddcce2c333b60828350c3d, Python 2 should no longer raise an exception when debugging a core.OctetString.

As of 2260ee300972ae3e0cd7d31710d21432466773e0, core.Sequence will raise a ValueError when an invalid field is specified.

@balena I implemented your suggestion of being able to pass a dict to a core.Choice constructor, but it felt like trying to pound a square peg through a round hole. I ended up stashing it in case I want to revisit later. What about being able to pass a 2-element tuple of (name, value)?

balena commented 8 years ago

@wbond, regarding the dict as a core.Choice, you may have felt bad with it, but it is actually the most natural way...

Let's focus on an example:

# Current construction:
'recipient_infos': [
    cms.RecipientInfo(
        name='ktri',
        value={
            'version': u'v0',
            'rid': cms.RecipientIdentifier(
                name='issuer_and_serial_number',
                value=issuer
            ),
            # ...
        }
    )
]

I see a core.Choice as a special form of core.Sequence, where a single "attribute" exists (the attribute name defines the type of the value). Maybe the possible confusion is that when creating a dict for a core.Choice the developer may end up specifying more than one attribute, which in such a case you may raise an exception.

# Using dict:
'recipient_infos': [
    { 'ktri': {
        'version': u'v0',
        'rid': { 'issuer_and_serial_number': issuer },
        # ...
    }
]

The code below is not as fluent to me as the above, but it is certainly better than the original. Similarly, the developer may pass a tuple with more than 2 elements, so it may be a good idea to prevent such situations raising an exception:

# Using tuple:
'recipient_infos': [
    ( 'ktri', {
        'version': u'v0',
        'rid': ( 'issuer_and_serial_number', issuer ),
        # ...
    )
]

Whatever way you choose will be much better than now.

wbond commented 8 years ago

I do understand what you mean about the dict seeming more fluent, but it feels wrong from a data perspective to use a data structure designed to contain multiple keys when only exactly one key can ever be used. I think this is why I proposed using a tuple since it is the natural data type for a fixed-length sequence.

Certainly both forms can be error checked.

Also, the current implementation can be used in the following way:

'recipient_infos': [
    cms.RecipientInfo('ktri', {
            'version': u'v0',
            'rid': cms.RecipientIdentifier('issuer_and_serial_number', issuer),
            # ...
    })
]

The tuple version allows you to just omit the cms.RecipientInfo syntax.

balena commented 8 years ago

... omit the cms.RecipientInfo and cms.RecipientIdentifier syntax.

Any of dict or tuple will certainly work pretty fine. When making such language mappings, there's never a single way, maybe not a better way...

All I still think is that dict is more natural to type than tuple in this case (a programmer will almost certainly try dict first, get an exception and then use a tuple).

wbond commented 7 years ago

@balena I ended up implementing support for both a single-element dict and a two-element tuple in 9d6ca0c8b7651b69797fa9c1a564f0999f519fa1.

balena commented 7 years ago

Looks great! Thanks @wbond

wbond / asn1crypto

CMS Enveloped Data creation failure #18