linz / geostore

Central storage, management and access for important geospatial datasets
MIT License
33 stars 2 forks source link

Bug: Static log group name #2364

Open Jimlinz opened 1 year ago

Jimlinz commented 1 year ago

Enabling Step Function State Machine logging is one of the recommendations put forth by cdk-nag.

This was put in place in #2300; however, we soon run into the following problem:

"Invalid Logging Configuration: The CloudWatch Logs Resource Policy size was exceeded" problem. The suggested solution is to prefix CloudWatch log group name with /aws/vendedlogs/states/.

The fix for this problem was (thought to have been) resolved in #2311

Since log group names are now static, we run into another problem where cdk will fail to redeploy if a log group with a matching name already exists. Each ci run has a unique seed, which means this isn't a problem (mostly)... until we try to rerun the same job again (see https://github.com/linz/geostore/actions/runs/3539833454/jobs/5951363759)

A partial fix has been put in place to delete log group during stack teardown (see https://github.com/linz/geostore/pull/2361). This ensures the problem is unlikely to repeat in ci, since stacks are always town down and associated resources removed. However, this would still be a problem in production (should we try to redeploy the stack where an existing log group name already exists).

We should probably move away from using static log group name. One way is to explore if there is an alternative way to revert and fix what was introduced in #2311 (e.g. by using wildcard policy perhaps?)

Some investigation is needed.

Jimlinz commented 1 year ago

From aws docs:

Amazon CloudWatch Logs resource policy size restrictions CloudWatch Logs resource policies are limited to 5120 characters. When CloudWatch Logs detects that a policy approaches this size limit, it automatically enables log groups that start with /aws/vendedlogs/. When you create a state machine with logging enabled, Step Functions must update your CloudWatch Logs resource policy with the log group you specify. To avoid reaching the CloudWatch Logs resource policy size limit, prefix your CloudWatch Logs log group names with /aws/vendedlogs/. When you create a log group in the Step Functions console, the log group names are prefixed with /aws/vendedlogs/states. For more information, see Enabling Logging from Certain AWS Services.

It seems like using vendedlogs is still the way to go as there is no way of stopping Step Functions from updating CloudWatch Log resource policy. Question is, how do we stop cdk from throwing error during redeployment (as opposed to cloudformation changeset):

5:03:15 pm | CREATE_FAILED | AWS::Logs::LogGroup | processingstatemachinelogsB337A208 Resource handler returned message: "Resource of type 'AWS::Logs::LogGroup' with identifier '{"/properties/LogGroupName":"/aws/vendedlogs/states/geostore-dataset-import"}' already exists." (RequestToken: ade94f68-e1e7-89c3-334f-875d1a921233, HandlerErrorCode: AlreadyExists)