open-mpic / aws-lambda-python

An implementation of the Open MPIC API using AWS-Lambda serverless fucntions written in Python as well as AWS API Gateway.
MIT License
6 stars 1 forks source link

CAA tests #8

Closed birgelee closed 3 days ago

birgelee commented 2 weeks ago

I will be adding more advanced integration test logic in the future. For now, I wanted to mention CAA test suite. I propose we write integration tests for all of these cases pointing to the CAA test suite domains. I have seen this used in the CAA industry and we certainly need to be compliant with these cases. These test values could also be used as dns mocks in unit tests as well.

Here is the list of test domains. https://caatestsuite.com/

I just wanted to bring this up as a todo for down the road.

birgelee commented 1 week ago

I added all of these as integration tests. The last two are skipped because that behavior is no longer supported. Sadly we failed a couple so we will need to fix those. Below are the current results.

✓ Api should return is valid false for do not issue caa test suite empty basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite deny basic caatestsuite com ✗ Api should return is valid false for do not issue caa test suite uppercase deny basic caatestsuite com ✗ Api should return is valid false for do not issue caa test suite mixedcase deny basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite big basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite critical1 basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite critical2 basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite sub1 deny basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite sub2 sub1 deny basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite star deny basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite star deny wild basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite cname deny basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite cname cname deny basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite sub1 cname deny basic caatestsuite com ✓ Api should return is valid false for do not issue caa test suite deny permit basic caatestsuite com ✗ Api should return is valid false for do not issue caa test suite ipv6only caatestsuite com ✗ Api should return is valid false for do not issue caa test suite expired caatestsuite dnssec com ✗ Api should return is valid false for do not issue caa test suite missing caatestsuite dnssec com ✗ Api should return is valid false for do not issue caa test suite blackhole caatestsuite dnssec com ✗ Api should return is valid false for do not issue caa test suite servfail caatestsuite dnssec com ✗ Api should return is valid false for do not issue caa test suite refused caatestsuite dnssec com ✓ Api should return is valid false for do not issue caa test suite xss caatestsuite com » Api should return is valid false for do not issue caa test suite dname deny basic caatestsuite com » Api should return is valid false for do not issue caa test suite cname deny sub basic caatestsuite com

sciros commented 1 week ago

I can get on these in a new branch that I'm pushing up (ds-caa-logic-fixes). I'm going to refactor the tests a bit so they're parametrized -- it'll make them take up WAY fewer lines of code :) I'll then get to fixing them. A couple are straightforward (the casing).. the others I haven't looked at yet.

birgelee commented 1 week ago

That sounds great. I like the idea of the refactor on the tests. I was thinking that but I wanted to just put the tests in first and there were not too many.

The casing one is pretty obvious. We just need a .lower() on the tag string when we do the parsing.

The others I suspect are being caused by either improper handling of DNSSEC errors, or AWS not validating DNSSEC. We should check the code, but I don't think it will be too hard to fix. We also should make sure that in the IPv6 only nameserver we do not issue.

I recommend on general principle that we do not ever treat a lookup failure as permission to issue and simply have logic that a DNS error code = issuance blocked at that perspective.

sciros commented 1 week ago

I've added the fix for casing, was basically "if exact match for 'issue' or 'issuewild' then cool, else if .lower() is 'issue' or 'issuewild' then not cool). It's in a unit test. (I will add a unit test for each fix where possible rather than just rely on the integration tests.)

The DNSSEC part is something I'm not familiar with and maybe should discuss with you. I don't know if I should be writing DNSSEC validation within the checker code or not... there's not a straightforward way to do that with dnspython, as far as I can tell.

Bear in mind that up to this point I have not actually modified the CAA or DCV checker logic from what it was originally. I covered it in tests so that I wouldn't break it, but I'm not enough of a DNS expert (yet?) to second-guess it as far as completeness.

Agreed with the last point.

Also right now our code blocks issuance on a couple of tags that I suppose are "known" like "iodef" and such if they have a critical flag. If that's incorrect let me know. You can find comments to that effect in the TODO messages in the tests.

birgelee commented 6 days ago

Thanks for working on this. I plan to do a review shortly of the CAA checking code and ensure it is compliant with all of these tests. We have actually been doing a good amount of DNS + CAA work in our group, so I will take a look at the code in light of this.

Also, note that sme of the behavior may be more a function of AWS's DNS resolver instead of something we can actually modify. I will keep an eye out for that.

birgelee commented 6 days ago

So I made some commits that vastly improved error handling catching general DNS errors as well as undefined remote perspective errors, and changed perspective names to always be 'rir.aws-region' in all JSON objects.

I also had to change some code from the last commit about iSsUe tag casing because the proper behavior is not to error on strange cases but to always perform the check for tags case insensitive (RFC 8659: "Matching of tags is case insensitive"). Thus, isSuE is the same as issue and can still be used for issuance determination.

I updated all unit and integration tests to reflect these changes.

With these changes, we pass all the tests except the DNSSEC tests. Turns out AWS's default resolver DOES NOT VALIDATE DNSSEC! If you create a VPC, you can make a custom DNS resolver in route 53 that does validate DNSSEC but this adds considerable complexity. My preference would be to move recursive resolution into the codebase using a library, but I worry there will not be a recursive DNS option written in python. I still plan to follow up on this.

I also should mention we do pass the ipv6 only nameserver test simply because it is being handled as a lookup error. Ideally we would just talk to the nameserver on ipv6 and learn what the issue tags are. AWS is nice enough to also not support IPv6 outbound routing for lambda functions out of the box either. This is another piece of functionality we need to deploy a VPC for: https://aws.amazon.com/about-aws/whats-new/2023/10/aws-lambda-ipv6-outbound-connections-vpc/

Maybe VPC is the way to go, but I am still considering some alternatives.

sciros commented 6 days ago

Awesome, thanks for taking a look at that. Looks like I misread the casing logic related caa-do-not-issue tests, thanks for fixing that.

I did only a little bit of research on DNSSEC resolution in Python, and was not able to find anything great yet. There's DNSviz but that's a whole tool rather than a library. Other than that there's a couple of what look like shallow attempts with dnspython out there but nothing that's been battle-hardened. Dnspython does support querying with DNSSEC to an extent.. will that be enough? Not sure.

The introduction of Route 53 + VPC may be useful for other reasons and may be where we land for the API Gateway, but having to set one up for each perspective seems excessive and that's what we're looking at, right?

birgelee commented 6 days ago

Sounds good. From my research as well I am pretty hesitant to integrate any python DNS recursive resolvers that are shallow wrappers around dnspython. We have a pretty complex recursive engine we wrote in our group that uses dnspython that does a pretty good job, but its important to understand that there are a lot of optimizations and corner cases that are handled in production dns software.

I am working a bit on the integration of VPC + route 53. I am not entirely happy with this direction as it adds significant complexity and only provides slightly more control over the dns lookup process (essentially one flag for do you want DNSSEC or not). My preference would be to run an unbound recursive resolver that we control the full config for, but I do not think this is possible on lambda.

sciros commented 5 days ago

Yeah, this is a disappointing turn. I think you're right about our options.

I'm going to toss a new test into our mpic_coordinator tests to cover the new try/catch logic.

birgelee commented 5 days ago

I was able to get the dnssec vpc open tofu configs to work, so the CAA tests now all pass. This seems like a satisfactory resolution for now and the VPC option should in theory give us IPv6 support as well.

Thanks for adding the tests. In general the try logic I added is: any known error should ideally be caught by the remote perspectives and returned in an error object. If the lambda call times out or somehow returns a bogus response (i.e., an uncaught python error), there is another try catch layer to translate this into a perspective communication error at the coordinator level.

birgelee commented 5 days ago

I pushed a final commit that fixed an open tofu issue. We are now passing all unit and integration tests. I plan to merge this into master and delete this branch tomorrow.

sciros commented 5 days ago

I added logic in the mpic_coordinator try-catch to check check_type and create the Response object accordingly. Otherwise it was going to create CAA check responses even if a DCV check failed. The generic Exception is probably not ideal (IDE warns about it) but I'm not sure what-all exceptions we should check for. Mostly it'll be deserialization or key errors because the Payload won't match the expected check response from the perspective Lambdas.

We're now at 94% coverage so there's that.

All integrations tests pass on my end as well (I had to add some more permissions to my AWS user). The DNSSEC tests take almost 7 seconds each, which isn't great but maybe that's to be expected.

One other thing: I noticed we have a field caa_record_present defined in the class but the tests and code ignore that and set a present field instead. I went ahead and trued up the code to set and look for caa_record_present.

birgelee commented 3 days ago

We now pass the CAA tests and this code is in master. If a new issue with CAA tests arises we can open a new thread.