zenhack / simp_le

Simple Let's Encrypt client
GNU General Public License v3.0
223 stars 38 forks source link

Make simp_le work with ACME v2 endpoints #119

Closed buchdag closed 5 years ago

buchdag commented 5 years ago

This is a work in progress aiming to switch simp_le from ACME v1 compatibility to ACME v2 compatibility.

At the moment it is half working : you can obtain a cert from an ACME v2 endpoint, but if you try again with an existing ACME account key, it will fail with the ACME v2 enpoint returning

{
  "type": "urn:ietf:params:acme:error:malformed",
  "detail": "No Key ID in JWS header",
  "status": 400
}

I believe it has something to do with the fact that in its current form simp_le only persist the account private key but I did not understand what should be done yet (either by walking through RFC 8555 again or looking at the way certbot does this). @cpu I might need your help there.

Once this is fixed, the test suite should pass minus one or two pylint issues.

In the current form of this PR there is no backward compatibility with ACME v1 for the following reasons:

However adding backward compatibility with ACME v1 endpoints should be entirely doable (and I did most of the work on a previous draft of this PR).

cpu commented 5 years ago

@buchdag :wave: I haven't read through your diff but I think I can give some high level feedback.

I believe it has something to do with the fact that in its current form simp_le only persist the account private key but I did not understand what should be done yet (either by walking through RFC 8555 again or looking at the way certbot does this). @cpu I might need your help there.

In ACME v1 there was only one way to compose a JWS for an ACME request: You signed a JSON body and you put your whole public JWK into the protected headers of the JWS.

In RFC 8555 there's also a different way, used for the majority of requests. The idea is that after you've created an ACME account with newAccount the server knows your JWK already, so there's no reason to send it all the time. Instead of embedding the JWK in the JWS you just put a kid (key ID) in the protected header to identify yourself. The ACME server can look up the JWK for the account by the kid. For ACME your kid is the account ID returned in the location header for a newAccount request.

So in summary, RFC 8555 has two sorts of JWS:

  1. The old kind, with the embedded JWK, mostly used for newAccount requests (and one kind of revocation, key rollover).
  2. The new way, with the key ID, used for basically all other requests.

Based on your error it looks like your code is probably using the old kind of JWS for requests types expecting the new kind.

RFC 8555 Section 6.2 has most of the gory details: https://tools.ietf.org/html/rfc8555#section-6.2

If it helps the Let's Encrypt server-side code for validating request JWS is mostly located here: https://github.com/letsencrypt/boulder/blob/3de2831c329932a58814110102df884d3d576e5f/wfe2/verify.go

I'm not aware of a lot of other ACME implementation and I'm not sure that they use v1 (@cpu ?)

Unfortunately there are a handful of ACME implementations I know of that targetted "ACME v1". As one example at the time of writing BuyPass's production ACME endpoint is not RFC 8555 and instead targeted Certbot compatibility.

It's up to you whether that matters to simp_le or not. From my perspective I think ACME v1 should be deprecated with the most haste your users are comfortable with. There is no specification for ACME v1. It doesn't match up to any single draft from the RFC 8555 standardization process. It's very hard to ensure any kind of interoperability with ACME v1 CAs because there is no document to point to that can provide implementation guidance.

Hope that helps! Happy to answer more q's if you have them.

cpu commented 5 years ago

In the current form of this PR there is no backward compatibility with ACME v1 for the following reasons:

Oh! One other nice advantage to being ACME v2 only: You can implement integration tests using Pebble instead of having to deal with the complexity of Boulder or using the staging environment and real validations. Pebble has no ACME v1 implementation so you would either have untested code or need to use a full Boulder stack.

buchdag commented 5 years ago

For ACME your kid is the account ID returned in the location header for a newAccount request.

Okay I think I got the kid stuff, my issue is understanding how this is structured in acme-python.

Let say I register my new account this way :

net = acme.ClientNetwork(key=key, args.user_agent)
directory = messages.Directory.from_json(net.get(args.server).json())
client = acme.ClientV2(directory, net=net)

reg = messages.NewRegistration.from_data(email=args.email)
reg = reg.update(terms_of_service_agreed=True)

client.new_account(reg)

If I understood acme-python's ClientNetwork class right, the kid will be then stored in

client.net.account

This should be stored somewhere when the account is persisted to disk then passed along to ClientNetwork when simp_le is re-using an existing account ?

cpu commented 5 years ago

This should be stored somewhere when the account is persisted to disk then passed along to ClientNetwork when simp_le is re-using an existing account ?

Yup! That sounds right to me. Unfortunately my experience with the acme-python code is limited to how we use it in chisel2.py and we don't need to maintain account state between sessions. I suspect pickling or serializing the object somehow is the right approach. Checking how Certbot handles that is probably a good call :+1:

buchdag commented 5 years ago

Ok, so just to be sure I'm on the right track, this is from a real account I just created on the Let's Encrypt ACME v2 staging endpoint :

>>> print(client.net.account.json_dumps_pretty()) 
{
    "body": {
        "contact": [
            "mailto:admin@somedomain.com"
        ],
        "key": {
            "e": "AQAB",
            "kty": "RSA",
            "n": "[...]"
        },
        "status": "valid"
    },
    "terms_of_service": "https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf",
    "uri": "https://acme-staging-v02.api.letsencrypt.org/acme/acct/9582881"
}

The kid you are referring to is the uri string, right ?

cpu commented 5 years ago

The kid you are referring to is the uri string, right ?

Yup!

zenhack commented 5 years ago

Thanks for taking the initiative on this.

I'm in favor of switching over to v2-only; I don't think it's worth the trouble to support both protocols.

I'll have a closer look at the patch this evening.

buchdag commented 5 years ago

Okay, I just added persistence of the ACME v2 account registration info by using / extending the existing IOPlugin system

Reusing a previously persisted ACME account_key.json + account_reg.json now appears to work but I haven't tested obtaining a cert yet.

Tests haven't been updated yet, I'll take a look at that next.

zenhack commented 5 years ago

Aha, that makes sense.

buchdag commented 5 years ago

I confirm that after ee66ae6 cert issuance against a v2 endpoint still works and that calling simp_le from a directory already containing an account now work as intended.

The revocation doesn't work yet:

ACME server returned an error: urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization :: JWK embedded in revocation request must be the same public key as the cert to be revoked

The only other part I still have issue with is the ExternalIOPlugin() class and the corresponding tests. To be very honest I hate this stuff, I think it's coded in a very unclear way, the feature itself is almost undocumented and I doubt anyone ever really used it. If it was just up to me I'd happily just trash it.

simp_le --test with the ExternalIOPlugin() parts removed and unit tests updated works ok.

zenhack commented 5 years ago

Quoting Nicolas Duchon (2019-06-13 16:17:50)

The only other part I still have issue with is the ExternalIOPlugin() class and the corresponding tests. To be very honest I hate this stuff, I think it's coded in a very unclear way, the feature itself is almost undocumented and I doubt anyone ever really used it. If it was just up to me I'd happily just trash it.

I 100% agree; if you want to do the leg work to strip it out I'm totally on board. The only reason I haven't is that since I picked it up there hasn't been enough need to change things for it to be worth the trouble.

buchdag commented 5 years ago

@cpu given the following from RFC8555:

7.3.1. Finding an Account URL Given a Key

If the server receives a newAccount request signed with a key for which it already has an account registered with the provided account key, then it MUST return a response with status code 200 (OK) and provide the URL of that account in the Location header field. The body of this response represents the account object as it existed on the server before this request; any fields in the request object MUST be ignored. This allows a client that has an account key but not the corresponding account URL to recover the account URL.

should we allow users to not persist the account object and just fetch the kid / uri from the location header field when simp_le does a newAccount request with an existing account key ?

buchdag commented 5 years ago

Revocation is fixed, ExternalIOPlugin() has been removed, both unit tests and integration tests are passing, but the linting test still fails and the Travis CI output isn't helping.

edit: simp_le.py:286:9: E128 continuation line under-indented for visual indent

yeah sure bro

cpu commented 5 years ago

should we allow users to not persist the account object and just fetch the kid / uri from the location header field when simp_le does a newAccount request with an existing account key ?

@buchdag You could, but I don't know that it's especially worthwhile. You have to persist the account key to be able to do that and so I'd probably be inclined to persist the whole account object. It also makes it slightly easier for an end-user to know their ACME account ID (checking an on-disk config somewhere). It's occasionally useful to know that (e.g. for requesting server-side rate limit adjustments).

buchdag commented 5 years ago

@cpu what I had in mind was more a way to still be able to use an existing account key if the persisted account object somehow got deleted. simp_le is already catching the ConflictError from acme-python when the account already exists, so I thought we might as well use it to recover the uri if needed.

Does it make sense this way ?

simp_le would still try to persist the whole private key + account object anyway.

cpu commented 5 years ago

@buchdag Ahhh! That makes sense, sorry I misunderstood. Yes, recovering the key ID in that scenario makes sense.

buchdag commented 5 years ago

@zenhack I'm still stuck with pylint on the following:

R:1393, 0: Too many local variables (18/15) (too-many-locals)

For the following function, already refactored to use fewer local variables:

def persist_new_data(args, existing_data):
    """Issue and persist new key/cert/chain."""
    roots = compute_roots(args.vhosts, args.default_root)
    logger.debug('Computed roots: %r', roots)

    client = registered_client(
        args, existing_data.account_key, existing_data.account_reg)

    if args.reuse_key and existing_data.key is not None:
        logger.info('Reusing existing certificate private key')
        key = existing_data.key
    else:
        logger.info('Generating new certificate private key')
        key = ComparablePKey(gen_pkey(args.cert_key_size))

    csr = gen_csr(
        key.wrapped, [vhost.name.encode() for vhost in args.vhosts]
    )
    csr = OpenSSL.crypto.dump_certificate_request(
        OpenSSL.crypto.FILETYPE_PEM, csr
    )

    order = client.new_order(csr)

    authorizations = dict(
        [authorization.body.identifier.value, authorization]
        for authorization in order.authorizations
    )
    if any(supported_challb(auth) is None
           for auth in six.itervalues(authorizations)):
        raise Error('CA did not offer http-01-only challenge combo. '
                    'This client is unable to solve any other challenges.')

    for name, auth in six.iteritems(authorizations):
        challb = supported_challb(auth)
        response, validation = challb.response_and_validation(client.net.key)
        save_validation(roots[name], challb, validation)

        client.answer_challenge(challb, response)

    try:
        order = finalize_order(client, order)
        pems = list(split_pems(order.fullchain_pem))

        persist_data(args, existing_data, new_data=IOPlugin.Data(
            account_key=client.net.key,
            account_reg=client.net.account,
            key=key,
            cert=jose.ComparableX509(OpenSSL.crypto.load_certificate(
                OpenSSL.crypto.FILETYPE_PEM, pems[0])),
            chain=[
                jose.ComparableX509(OpenSSL.crypto.load_certificate(
                    OpenSSL.crypto.FILETYPE_PEM, pem))
                for pem in pems[1:]
            ],
        ))
    except Error as error:
        persist_data(args, existing_data, new_data=IOPlugin.Data(
            account_key=client.net.key,
            account_reg=client.net.account,
            key=None,
            cert=None,
            chain=None,
        ))
        raise error
    finally:
        for name, auth in six.iteritems(authorizations):
            challb = supported_challb(auth)
            remove_validation(roots[name], challb)

and

E:1419,29: Non-iterable value order.authorizations is used in an iterating context (not-an-iterable)

caused by this dict comprehension:

authorizations = dict(
        [authorization.body.identifier.value, authorization]
        for authorization in order.authorizations
    )

This is Python 2 specific, using pylint with Python 3 has its own set of warnings (but not errors):

simp_le.py:161:0: R0205: Class 'ComparablePKey' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
simp_le.py:275:0: R0205: Class 'IOPlugin' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
simp_le.py:310:0: W0613: Unused argument 'dummy_kwargs' (unused-argument)
simp_le.py:689:4: R0205: Class 'AssertRaisesContext' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
simp_le.py:751:0: R0205: Class 'PluginIOTestMixin' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
simp_le.py:1220:20: R1718: Consider using a set comprehension (consider-using-set-comprehension)
simp_le.py:1501:4: R1705: Unnecessary "elif" after "return" (no-else-return)

Given that Python 2 will be EOL'd in six months, should we target linting with Python 3 instead ?

buchdag commented 5 years ago

I've been rebasing this PR and making it Python 3 only on another branch, tests are passing ok : https://travis-ci.org/buchdag/simp_le/builds/547327956

BTW Python 3.4 was EOL'd a few months ago.

zenhack commented 5 years ago

I'm fine with dropping python 2 support (and 3.4). Make sure to update the classifiers in setup.py

buchdag commented 5 years ago

Ok two last question and I think I'll have everything covered:

zenhack commented 5 years ago

Let's go with account_reg.

Do the rebase first I guess.

buchdag commented 5 years ago

Okay, PR rebased, tests are passing, doc updated, it should be down to that last change request.

zenhack commented 5 years ago

I guess I don't feel that strongly. Merging. I'll dogfood it later this week and then tag a release.