Output mode for submitting results directly to AWS Security Hub

Hi @toniblyx

I came across https://aws.amazon.com/blogs/security/use-aws-fargate-prowler-send-security-configuration-findings-about-aws-services-security-hub/ which achieves my goal of submitting findings to Security Hub, but I would like to find a simpler method if I can.

Would you accept a pull request that added a new output mode, securityhub, then when chosen, Prowler would submit each check result directly to AWS Security Hub using the AWS CLI? I envisage the output mode generating a JSON string and then calling: aws securityhub batch-import-findings --findings <JSON Finding>.

If this is acceptable, I'll spend the time on a PR and add the optional IAM permission to the README etc.

Appreciate this is somewhat inefficient as the batch API can accept 100 checks at a time, so if you don't think this is a good idea, I'll look to generate a CSV and use a wrapper approach like the Wazuh integration instead.

Many thanks

Hi @marcjay, that's a great idea. It is something I'd like to do as well to simplify the integration, I also played with batch-import-findings and it makes the job pretty easy, initially. There are some considerations to take into account though. Due to the need of sending finding as ASFF (AWS Security Finding Format):

In Prowler It is easy to create ASFF output in json by adapting include/outputs. I would add another output type called json-asff for that and leaf the former json output as it is.
Each finding requires a unique identifier (Id) and it has to be the same identifier if its status changes afterwards otherwise Security Hub won't realize that a particular finding has changed status. That may require persistence somewhere, a DB, that's why in the POC you mention in the blopost uses DynamoDB (see aws securityhub update-findings help).
That POC mentioned also uses a lambda to generate and map findings in CSV output with a valid JSON ASFF formated. It also requires more data that ASFF has as mandatory and Prowler doesn't provide quite yet (like severity for example) but I can work on that with you. See below the minimum set of attributes that are required and my comments on what Prowler has today:
- AwsAccountId: YES, that's available
- CreatedAt: KIND OF, that's available but not at finding level, needs a change.
- Description: YES, that's available
- GeneratorId: this could be prowler-version
- Id: NO, has to be unique and not recycled, I.e: prowler-checkID-accountID-region.
- ProductArn: NO, has to be gotten somehow and it may require rewrite some checks.
- Resources: NO, has to be created somehow.
- SchemaVersion: YES, 2018-10-08.
- Severity: NO, has to be added per check.
- Title: YES.
- Types: NO, has to be defined as namespace/category/classifier
- UpdatedAt: NO, this is what I mentioned before, this is needed at creation and update.
Limits: there is a limit of 100 findings per request, that has to be considered.

Did you have that in mind? If that's all right for you we could move forward with that.

Hi @toniblyx - thank you for the swift and considered reply. Adding in my thoughts

In Prowler It is easy to create ASFF output in json by adapting include/outputs. I would add another output type called json-asff for that and leave the former json output as it is.

That sounds good to me. I can see from #190 that the current approach whereby each check outputs a valid, self-contained JSON object causes issues for some, where they are wanting a single valid JSON array object, containing all the checks. Presumably the existing design would apply here, with the user needing to post-process the JSON ASFF output, in order to either put up to 100 lines of findings into a JSON array and sending to AWS as a batch, or to wrap each object in [] and send each individually?

Architecturally, I don't imagine there's any other way? Holding the check outputs in memory and outputting fully-formed array(s) of findings for submitting to the API in batches once all checks are completed seems like it would require a big design change?

Each finding requires a unique identifier (Id) and it has to be the same identifier if its status changes afterwards otherwise Security Hub won't realize that a particular finding has changed status. That may require persistence somewhere, a DB, that's why in the POC you mention in the blopost uses DynamoDB (see aws securityhub update-findings help).

I also think an Id following the format prowler-checkID-accountID-regionis very sensible. Incidentally I took a look at the POC code and it would appear that status updates are not happening on existing findings, as it generates a random UUID on every request: link. From some testing, I think with this consistent format, persistence won't be necessary.

That POC mentioned also uses a lambda to generate and map findings in CSV output with a valid JSON ASFF formated. It also requires more data that ASFF has as mandatory and Prowler doesn't provide quite yet (like severity for example) but I can work on that with you. See below the minimum set of attributes that are required and my comments on what Prowler has today:

Again I was a little surprised that the Lambda function avoided many of these issues and is rather simplistic it seems: link.

CreatedAt: Fetch current time in include/outputs?
ProductArn: The lambda function approach looks good here, having read the docs: 'arn:aws:securityhub:' + awsRegion + ':' + accountId + ':product/' + accountId + '/default'
Resources: Is the generic approach in the Lambda sufficient for a first stab? link Otherwise it's considerable changes to all checks I would imagine?
Severity: I would argue same approach as Resources?
Types: Same again? link
UpdatedAt: Again current time here should be fine I think? From testing, 'updating' a finding, with a newer but identical CreatedAt and UpdatedAt correctly updates UpdatedAt in Security Hub but doesn't touch the CreatedAt time

Perhaps for Resources, Severity and Types, we use those values as generic defaults, with checks being able to override them, so a gradual approach could be taken to classify existing/new checks, falling back to these sane generic defaults?

Limits: there is a limit of 100 findings per request, that has to be considered.

My initial thought was that if a securityhub output mode was chosen, each call to textPass(), textFail() and textInfo() would pipe an ASFF JSON string directly into a aws securityhub batch-import-findings --findings {} call - i.e. the CLI would be invoked once per check, so no concern around limits. Going back to the architecture question, Is there a way you think we could batch this?

Thanks for all your feedback @marcjay, those examples on the code mentioned in the blog post are pretty limited to make it work, that is why I say that is a POC or an initial test, which is good but I don't see it for a real use case since it does not update exiting findings. I also agree with you on sending findings in securityhub output mode per call, that makes it probably easier to manage in terms of limits. If we go with the import per check we don't have to think about batch, do we? My main concern here is persistence for existing checks and their updates. What do you suggest for that?

Hey folks, weighing in here real quick since I was the one that wrote the blog (with Toni's help) and I was the TPM on the Security Hub team during our major ASFF and Product ARN Changes.

The blog is a limited first pass, this is due to the lack of any sort of resource information being output from Prowler, which Toni has addressed in another branch somewhere. That is why I mapped everything to the AwsAccount Resource.Type.

In a perfect world, if Prowler can output Resource ARN and had a mapping of Resource.Type that matched the ones for ASFF then it would be trivial to pick up the other needed items that are mandatory for a finding and I would also drop the UUID as I'd be able to from an ID of Resource + Check + Compliance status. ID's in Security Hub are stateful and forming ID's in that way can strongly bind them to the resource. I got around this by using DyanmoDB as an intermediary and streaming NEW_IMAGES to Lambda before going to Security Hub, it would keep duplicates from coming in, but it would also prevent a resource moving in and out of compliance from going to Security Hub which is not ideal.

As far as findings go, our BatchImportFindings API has a rate limit of 10TPS with a burst of 30TPS before being throttled. The throttle will reset every second, and if Prowler was calling BIF after every Group or individual Check that would be fine, you are metered for finding ingestion events above 10K per month per region per Account in Security Hub, it doesn't matter if you call BIF with batches of 100 or 1 finding per call.

As far as CreatedAt goes, if you are using the same finding ID then it will ignore that in the backend, which is proper functionality, you still need to include it in the call funnily enough. We were looking at having our back end handle both CreatedAt and UpdatedAt but I'm not sure if that decision was scuttled completely or will be brought about.

If I wrote that blog today, I'd convert CSV to JSON in Fargate and directly call BIF instead of going to DyanmoDB. That said, I still need a Resource.Type and an ARN per Prowler finding to be able to do that. Ideally I'd also get a Severity that matched Security Hub as an output so there are no translations needed at all.

Thanks @toniblyx and @jonrau1 for your input - good to hear some of the context behind the blog post and integration method.

I figured that it was easier to have a conversation with some code to look at, so I've open a draft pull request: #530. There are still some tasks to work on which I've noted on the PR, keen to get feedback and make changes

On the question of addressing the need for a unique Id per finding - to allow for updates over time but also handle the fact that each check might run against multiple resources making check ID non-unique, I used the format prowler-TITLE_ID-accountID-region-UNIQUE_ID, where UNIQUE_ID is the message from prowler, with all non conforming characters (according to the ASFF spec) removed. So, output such as:

1.3 [check13] Ensure credentials unused for 90 days or greater are disabled (Scored) PASS! User "bsmith" found with credentials used in the last 90 days PASS! User "jdoe" found with credentials used in the last 90 days

Translates to separate findings with Ids: prowler-1.3-1234567890-eu-west-1-User_bsmith_found_with_credentials_used_in_the_last_90_days prowler-1.3-1234567890-eu-west-1-User_jdoe_found_with_credentials_used_in_the_last_90_days

I do appreciate this means that if a new version of prowler makes changes to the messaging for a particular check, findings from it won't update existing findings, but I don't see an easy solution, given that some checks iterate over multiple resources across multiple regions so there can be quite a few variables in each message

How findings appear in Security Hub: From prowler output:

$ ./prowler -M securityhub
                          _
  _ __  _ __ _____      _| | ___ _ __
 | '_ \| '__/ _ \ \ /\ / / |/ _ \ '__|
 | |_) | | | (_) \ V  V /| |  __/ |
 | .__/|_|  \___/ \_/\_/ |_|\___|_|v2.2.1
 |_| the handy cloud security tool

 Date: Tue  7 Apr 2020 15:28:02 BST

 Colors code for results:
 INFO (Information), PASS (Recommended value),  FAIL (Fix required),  Not Scored

 This report is being generated using credentials below:

 AWS-CLI Profile: [] AWS API Region: [eu-west-1] AWS Filter Region: [all]

 Caller Identity:
-------------------------------------------------------------------------------------
|                                 GetCallerIdentity                                 |
+--------------+------------------------------------------+-------------------------+
|    Account   |                   Arn                    |         UserId          |
+--------------+------------------------------------------+-------------------------+
|  123456789|  arn:aws:iam::123456789:user/marcjay  |  ABCDEFGH  |
+--------------+------------------------------------------+-------------------------+

 1.0 Identity and Access Management - CIS only - [group1] ***********

 0.1 Generating AWS IAM Credential Report...

 1.1  [check11] Avoid the use of the root account (Scored)
       PASS! Root user in the account wasn't accessed in the last 1 days... Successfully submitted finding

 1.2  [check12] Ensure multi-factor authentication (MFA) is enabled for all IAM users that have a console password (Scored)
       FAIL! User bsmith has Password enabled but MFA disabled... Successfully submitted finding
       FAIL! User jdoe has Password enabled but MFA disabled... Successfully submitted finding
       FAIL! User bwilliams has Password enabled but MFA disabled... Successfully submitted finding

 1.3  [check13] Ensure credentials unused for 90 days or greater are disabled (Scored)
       PASS! User "bsmith" found with credentials used in the last 90 days... Successfully submitted finding
       PASS! User "jdoe" found with credentials used in the last 90 days... Successfully submitted finding
       PASS! User "bwilliams" found with credentials used in the last 90 days... Successfully submitted finding

This is awesome! Thanks for the clarification @jonrau1 and for your code @marcjay, looking at it.

prowler-cloud / prowler

Output mode for submitting results directly to AWS Security Hub #524