PDF generator review - Githubissues

aplr commented 5 months ago

This ticket gathers issues from my initial review on the PDF generator.

aplr commented 5 months ago

Dockerfile

[x] Remove debug statements

I guess the help output can be removed

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/Dockerfile#L13

[x] npm package lockfile

I guess you also want to copy over the lockfile, as otherwise npm would just install random (newest) versions, which can lead to inconsistencies and builds failing in docker while working locally.

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/Dockerfile#L20

[x] TS build

I guess you also want to copy over the tsconfig.json, as it contains the configuration for your build. Not copying it could lead to transpiling the TS code w/ a different configuration in docker than in your local dev environment.

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/Dockerfile#L24

[x] Final image version

The version of the final image does not match the build version

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/Dockerfile#L28

aplr commented 5 months ago

Github Workflow

[x] Set custom timeout

As declared in the limitations, "A paused job is still running compute/instance/virtual machine and will continue to incur costs". This could quickly eat up the included minutes for Github Actions, while the Github approval issue is just lying around. I would define a custom timeout of a few minutes in order to prevent this from happening.

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/.github/workflows/main.yml#L57-L67

[x] Duplicate container builds

Why are we doing three container builds and push them to separate container repositories, for dev, staging, and prod? As far as I can tell, all builds actually produce the exact same image.

Can't we just do the image build once, use one repository, and then run lambda update-function-code, for dev and staging with --function-name lambda-feedback-{dev|staging}-pdf-generator and after approval prod with --function-name lambda-feedback-prod-pdf-generator, while just using the same image?

This will both reduce the time we wait for deployments, and also don't eat up the Github Actions time included in the plan

aplr commented 5 months ago

Function handler

[x] Event payload parsing

This seems a bit odd to me. Can't you just schema.safeParse the event.body? This should be a string, from the types at least.

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/index.ts#L26

[x] Filename code style

You could build the date string beforehand and introduce a new variable, and then use it. Only personal preference, but I think it improves readability.

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/index.ts#L38-41

[x] Temp directory creation

When creating the temp file, you could use node's built-in fs.mkdtemp. This uses os-specific temp dirs and prevents clashing of file names.

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/index.ts#L43

[x] Re-run for error output

In case of errors, you're re-running pandoc to get tex output. Can't you just get the tex output from the previous run, or why exactly are you doing it this way?

https://github.com/lambda-feedback/pdf-generator/blob/698930699670ca5140bc45a26e839c95830fc026/index.ts#L68-75

jarkabaker commented 5 months ago

Dockerfile

Remove debug statements

Done

npm package lockfile

Done

TS build

Done

Final image version

Yes. Indeed. Also yum needed to be replaced by dnf because yum is not supported by nodejs v20

jarkabaker commented 5 months ago

Github Workflow

Set custom timeout

Done (set to 10 mins)

Duplicate container builds

Not implemented (after our discussion). Though, I have added Lifecycle Policy against the PDF Generator registry so that max 2 images are stored (and older versions deleted)

jarkabaker commented 5 months ago

Function handler

Event payload parsing

Done

Filename code style

Done (I agree, it is much clearer)

Temp directory creation

Not implemented. We do have to use the "tmp" folder, because it is the only folder the lambda allows us to use for writing. It is very rare for more than 1 person working on the same set and I would imagine this together with the timestamp makes it unlikely to clash (this has not been changed from the original implementation and as far as I know there was no such incident, though we will have to revisit it if that would start to be an issue)

Re-run for error output

This is because pandoc generates the type of the file dependently on the output file suffix, so the first generation creates pdf file. We need to rerun it to get tex file in case of an error to be able to identify which text on which line caused the issue (at least I think this is why we are doing it, again - this part of the code was moved without a change from the backend)

aplr commented 5 months ago

One more thing: don't copy the tsconfig (or any other files) alongside package*.json. Always 1) first copy dependency declaration files, then 2) install dependencies, then 3) copy code and 4) build code.

This way image builds can use caches as long as dependencies don't change.

So in short, move the COPY tsconfig.json down to where you copy the actual code

jarkabaker commented 5 months ago

All done

lambda-feedback / pdf-generator

PDF generator review #17

Dockerfile

Github Workflow

Function handler

Dockerfile

Remove debug statements

npm package lockfile

TS build

Final image version

Github Workflow

Set custom timeout

Duplicate container builds

Function handler

Event payload parsing

Filename code style

Temp directory creation

Re-run for error output