loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
34 stars 2 forks source link

feat(kubernetes): Send logs to log aggregator #1176

Open corneliusroemer opened 6 months ago

corneliusroemer commented 6 months ago

We don't aggregate our logs at the moment. This makes it difficult to quickly inspect what happened across containers and do retrospectives - as the logs disappear with the containers.

We could send our logs to various log aggregation services - there are various free tiers for a start - shouldn't be too difficult to set up. Some services come with a helm chart to set up in a few minutes.

theosanderson commented 6 months ago

For Loculus (not necessarily pathoplexus) this is post-MVP - so assigning it thus for now

corneliusroemer commented 1 month ago

I reclassified this as nice to have because both Anya and I feel that bad developer experience around viewing logs in argo cd alone is slowing down kubernetes bug fixing/development. Having logs very easily searchable in a central place would reduce friction a lot.

It shouldn't be hard to set up some log aggregation quickly.

theosanderson commented 1 month ago

OK, I would say in terms of priorities the discussion board for example could be more important

anna-parker commented 2 weeks ago

Adding to this: as when we delete containers we loose logs as well sending logs to an aggregator could help us to store them for review after an issue/incident.

theosanderson commented 2 weeks ago

We now have log aggregation for Pathopleus: https://loculus.slack.com/archives/C05G172HL6L/p1725300635789449

IMO this shows the pattern for implementing this: at the cluster level, so I am removing the high_priority label as (A) I don't think this should be implemented into the Loculus codebase itself (B) IMO it's not super high priority for the dev cluster (but I am still fine for it to be done at any time)

corneliusroemer commented 2 weeks ago

It would still be great to have for the Hetzner cluster and we should do so as soon as feasible - ideally exposed to the network so that one doesn't need to do port forwarding (which makes it inconvenient to use in practice)

But I agree it's maybe not high priority, and we don't need to add it to Loculus codebase as you showed with pathoplexus - my assumptions were wrong there.