guardian / mobile-n10n

n10n for nOTIFICATIOn
Apache License 2.0
26 stars 4 forks source link

Use bigger EC2 instance for notification #1212

Closed waisingyiu closed 7 months ago

waisingyiu commented 7 months ago

What does this change?

Editorials reported that they sent a breaking news notification via ed tools but no notifications were received. The notification record was not found on Ophan dashboard either.

After investigation, it was found that the breaking news tool sent a HTTP request to our notification endpoint but received a 503 exception.

The cloudwatch metric (ELB 5xx) for the load balancer of the notification API suggested that a 5xx response had been served from the load balancer at that time. One of the EC2 instances was also terminated due to health check failure around that minute.

We noticed that EC2 instances of notification service failed health check from time to time. The application logs of an unhealthy instance did not show any exception or error message, but the syslog of the OS indicated that there was out-of-memory error on OS process level.

I believe that the JVM of the service ran out of system memory when it was expanding its heap, given that we are using t4g.micro instance which has 1G memory only but the notification service and the AWS kinesis agent have a max heap size of 256M and 512M respectively.

This PR changes the cloudformation stack to use bigger EC2 instance, t4g.small, which has 2G memory.

How to test

I applied this bigger EC2 instance on CODE and the instance has stayed healthy for more than 1 day. It may be good to apply it on PROD too, and see if the problem with instances becoming unhealthy ceases to exist.

How can we measure success?

No instances become unhealthy.

github-actions[bot] commented 7 months ago

Deploy build 4293 of mobile-n10n:eventconsumer to CODE

All deployment options - [Deploy build 4293 of `mobile-n10n:eventconsumer` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Aeventconsumer&build=4293&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4293 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Aeventconsumer&build=4293&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Aeventconsumer&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4296 of mobile-n10n:schedule to CODE

All deployment options - [Deploy build 4296 of `mobile-n10n:schedule` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Aschedule&build=4296&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4296 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Aschedule&build=4296&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Aschedule&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4296 of mobile-n10n:football to CODE

All deployment options - [Deploy build 4296 of `mobile-n10n:football` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Afootball&build=4296&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4296 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Afootball&build=4296&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Afootball&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4292 of mobile-n10n:fakebreakingnewslambda to CODE

All deployment options - [Deploy build 4292 of `mobile-n10n:fakebreakingnewslambda` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Afakebreakingnewslambda&build=4292&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4292 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Afakebreakingnewslambda&build=4292&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Afakebreakingnewslambda&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4294 of mobile-n10n:reportextractor to CODE

All deployment options - [Deploy build 4294 of `mobile-n10n:reportextractor` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Areportextractor&build=4294&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4294 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Areportextractor&build=4294&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Areportextractor&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4293 of mobile-n10n:report to CODE

All deployment options - [Deploy build 4293 of `mobile-n10n:report` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Areport&build=4293&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4293 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Areport&build=4293&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Areport&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4301 of mobile-n10n:notification to CODE

All deployment options - [Deploy build 4301 of `mobile-n10n:notification` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Anotification&build=4301&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4301 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Anotification&build=4301&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Anotification&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4303 of mobile-n10n:slomonitor to CODE

All deployment options - [Deploy build 4303 of `mobile-n10n:slomonitor` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Aslomonitor&build=4303&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4303 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Aslomonitor&build=4303&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Aslomonitor&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4300 of mobile-n10n:registration to CODE

All deployment options - [Deploy build 4300 of `mobile-n10n:registration` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Aregistration&build=4300&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4300 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Aregistration&build=4300&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Aregistration&stage=CODE)

From guardian/actions-riff-raff.

github-actions[bot] commented 7 months ago

Deploy build 4327 of mobile-n10n:notificationworkerlambda to CODE

All deployment options - [Deploy build 4327 of `mobile-n10n:notificationworkerlambda` to CODE](https://riffraff.gutools.co.uk/deployment/deployAgain?project=mobile-n10n%3Anotificationworkerlambda&build=4327&stage=CODE&updateStrategy=MostlyHarmless&action=deploy) - [Deploy parts of build 4327 to CODE by previewing it first](https://riffraff.gutools.co.uk/preview/yaml?project=mobile-n10n%3Anotificationworkerlambda&build=4327&stage=CODE&updateStrategy=MostlyHarmless) - [What's on CODE right now?](https://riffraff.gutools.co.uk/deployment/history?projectName=mobile-n10n%3Anotificationworkerlambda&stage=CODE)

From guardian/actions-riff-raff.