intuit / Trapheus

This tool automates restoration of RDS database instances from snapshots into any dev, staging or production environments. It supports individual RDS Snapshot as well as cluster snapshot restore operations.
https://intuit.github.io/Trapheus/
MIT License
106 stars 53 forks source link

Update failure notifications to have details in a structured format #141

Closed namitad closed 10 months ago

namitad commented 1 year ago

Currrently the email and slack notification sent out on pipeline failure contains a text content with the error message as a plain string.

update both functions to send failure notification in the below format:

database id: <identifier name>
snapshot id: <snapshot identifier name>
failed step: <task name>
cause of failure: <error message>
WallysFerreira commented 1 year ago

I would like to work on this.

namitad commented 1 year ago

@WallysFerreira have assigned it to you.

WallysFerreira commented 1 year ago

This is my first contribution and I want to make sure I'm on the right track to implement this.

I need to change the expected_message in the 'test_lambda_handler_Error' test to match the new format then in the lambda_handler I need to get the database id, snapshot id, task name and error message from the event and add it to the message.

Is this right or did I not understand it correctly?

namitad commented 1 year ago

@WallysFerreira yes thats correct. Please ensure you are making this change for both the email and slack notification lambdas

stationeros commented 1 year ago

@namitad Can we add the executionId from the context object in the email as well, it would help pinpoint the exact execution ARN which failed , if not in this probably we can add an issue later this is merged

namitad commented 1 year ago

yes @stationeros that would be good addition and since it might involve changes in the overall state machine, i would prefer to have that as a independent issue.

github-actions[bot] commented 10 months ago

This pull request has been automatically closed because it has been inactive for more than 14 days. Please reopen if you still intend to submit this pull request.