guardian / dotcom-rendering

The Guardian web rendering service (aka DCR/DCAR)
https://www.theguardian.com
Apache License 2.0
252 stars 29 forks source link

Remove usage of `Makefile` in production environment #9439

Open mxdvl opened 10 months ago

mxdvl commented 10 months ago

The dotcom-rendering/Makefile is here to provide developer convenience.

In its current state, it is also used to start production scripts, with confusingly similar names:

This means that if any file required by these commands is missing from the riff-raff artifact, the server will fail to start and enter an infinite unhealthy loop.

As we do not need the convenience of checking whether Node is available and on the correct version, or any of the DevX benefits brought about by the Makefile on the production servers, we should remove all non-development (prod, ci) tasks from the Makefile, only using it for local development and using an alternative solution for production tasks (e.g. start-prod).

cemms1 commented 10 months ago

@mxdvl to confirm, this issue is to stop using the Makefile in production?

arelra commented 10 months ago

start I think can probably go it seems to have been superseded by start-prod.

start-ci is used to start the app in production mode locally and on CI without pm2. We run Cypress and Playwright against the production build to get a closest match to prod as implemented here. Development builds can be quite a bit slower, use more memory than a prod build and can sometimes (unfortunately) behave a little differently like Reacts strict mode.

build is also useful to build the production bundle locally to run e2e tests against. It's used by CI.

The 'prod' tasks can also be convenient when connecting to instances and stopping and starting the server to diagnose issues.

I don't mind moving the scripts out of the makefile if we think it will remove foot-guns but it would be useful to have the scripts in a known location in case we do need to run locally, on CI or on prod instances.

arelra commented 10 months ago

I wonder if a better approach is to have a GHA check which starts the container (which I believe uses the riffraff artifact), runs the prod startup script and ping the healthcheck. This way we can guarantee the prod instance should start and get early feedback if it doesn't.