neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
163 stars 21 forks source link

Epic: Separately tagged logs for VM processes, dmesg, and runner #578

Open sharnoff opened 1 year ago

sharnoff commented 1 year ago

Motivation

  1. For each VM, there's many programs generating logs that we care about, but because they all get aggregated into a single stream, it can be hard to filter for just the ones you care about (or even attribute which component a particular log line came from). The full list of components is:
    • neonvm-runner
    • QEMU (?) — we don't currently have logs for QEMU itself, but it'd be great to!
    • VM kernel logs (dmesg)
    • vector (running inside the VM to provide metrics)
    • postgres_exporter
    • pgbouncer
    • postgres
    • compute_ctl
    • vm-monitor
    • chrony
  2. Logs from the VM kernel can interrupt the middle of other log lines, which impacts our ability to search existing logs

These combine to significantly impair the UX of our observability for VMs.

DoD

  1. All logs from within the VM are easily attributable to a particular component
  2. Filtering for logs from a particular component is trivial, and works within the bounds of our existing systems (e.g. by using log labeling)
  3. Logs from the VM kernel cannot interfere with other logs
  4. Logs from a VM can still be viewed with kubectl logs during local development

Implementation ideas

TODO (various ideas, need to discuss)

Tasks

- [ ] ...
- [ ] List tasks as they're created for this Epic

Other related tasks, Epics, and links

lassizci commented 1 year ago

@lassizci

Omrigan commented 10 months ago

Can we utilize vector, which we already have inside the VM, to push logs directly to loki, which we also already have?

lassizci commented 10 months ago

It’s not a good practice from security perspective to have credentials in the virtual machines. Also if we thunk about reconfigurability, it’s best to have as little expectations regarding observability built in as possible, so the pipeline can evolve independently without the need to do reconfigurations in the compute instance level.

Omrigan commented 10 months ago

It’s not a good practice from security perspective to have credentials in the virtual machines.

But this would be write-only credentials. In such a case, we can only have DoS because of too many logs, which we can combat on the receiver end.

Another option is to have a separate instance of vector outside the VM in the pod, configured to pull data from the in-VM instance [1].

Also if we thunk about reconfigurability, it’s best to have as little expectations regarding observability built in as possible, so the pipeline can evolve independently without the need to do reconfigurations in the compute instance level.

What do you mean? Are you talking about updating credentials?

Or, in general, dependence on the particular observability agent? Such dependence, I believe, we cannot escape.

1: https://vector.dev/docs/reference/configuration/sources/vector/

lassizci commented 10 months ago

It’s not a good practice from security perspective to have credentials in the virtual machines.

But this would be write-only credentials. In such a case, we can only have DoS because of too many logs, which we can combat on the receiver end.

If we skip the collector we control, we can not deal with DoS at the receiving end. Postgresql escape would potentially give control over labeling etc.

We also do processing in between collection and sending the logs (relabeling, perhaps metrics from logs, switching between plaintext and json and so on…). Also queueing of the log sending should not happen inside the computes, but in trusted environment.

Lets say our log storage is offline and the compute suspends. That would either mean losing the logs or keeping the compute online for retries.

Another option is to have a separate instance of vector outside the VM in the pod, configured to pull data from the in-VM instance [1].

I think what makes the most sense is to write logs to a sovket, provided by the host. Then we can consider the further pipeline as an implementation detail.

Also if we thunk about reconfigurability, it’s best to have as little expectations regarding observability built in as possible, so the pipeline can evolve independently without the need to do reconfigurations in the compute instance level.

What do you mean? Are you talking about updating credentials?

Updating/rotating the credentials is one thing. Building metrics from the logs, relabeling, adding labels, changing the log collector to something else.

Or, in general, dependence on the particular observability agent? Such dependence, I believe, we cannot escape.

We can switch observability agent rather easily when it runs outside of the virtualmachines. That’s currently possible and I don’t think it makes much sense to make it harder, nor waste customer’s cpu time and memory for running such things.

sharnoff commented 10 months ago

From discussing with @Omrigan earlier: One simplification we can make is to just get logs from the VM to stdout in neonvm-runner (the container running the VM) — we already have logs collection in k8s, so we can just piggy-back on that, which makes it easier than trying to push the logs to some other place.

sharnoff commented 10 months ago

Notes from discussion:

Omrigan commented 9 months ago

We have an occurrence of non-postgres log spam (in this case, oom-killer), which won't be fixed by https://github.com/neondatabase/cloud/issues/8602

https://neondb.slack.com/archives/C03F5SM1N02/p1707489906661529

sharnoff commented 6 months ago

Occurrence of log interleaving that could potentially be fixed by this, depending how we implement it: https://neondb.slack.com/archives/C03TN5G758R/p1714057349130309

knz commented 3 weeks ago

xref https://github.com/neondatabase/cloud/issues/18244 We have customer ask to export the postgres logs to an external service, so they can inspect their own logs themselves (e.g via datadog).

We haven't fully specced that out yet but the assumption so far is that we would be reusing the OpenTelemetry collector that we already deploy to collect metrics, and route the logs through this.

knz commented 3 weeks ago

Regarding pushing logs to console / k8s logs: the volume will be too large in some cases, eg if the user cares about pg_audit logs. This will become a bottleneck. Also it will not solve the labeling problem, which we care about for product -- customer only wants their postgres logs, not our own control logs. Better export through the network directly (see point below).

Regarding push/pull and credentials: one option is to have a service running inside the VM that accepts incoming connections, and delivers the logs from the VM through that. Would that solve the problem?

Omrigan commented 1 week ago

The potential way how we can implement this:

  1. neonvm-daemon on startup creates a fifo per each program we currently have. An example of a path could be /var/log/neonvm/postgres_exporter.stdout.log.
  2. init file has a modified command: /neonvm/bin/postgres_exporter ... > /var/log/neonvm/postgres_exporter.stdout.log
  3. neonvm-daemon reads all the contents of each pipe, splits it by lines, wraps it with json, and feeds into virtio-serial.

Alternatively, we stop using busybox init to start our programs, and instead have the neonvm-daemon start everything. Thus it would gain access to stdout/stderr, which is then again forwarded to virtio-serial.

This doesn't cover neondatabase/cloud#18244, but if we implement the above, we can then write a logic which would filter needed postgres logs, and send it to customer's endpoint directly (or with functionless shim), so that customer's logs are never processed outside of the VM.

knz commented 1 week ago

I have no opinion about how we want to handle dmesg etc, but have a strong opinion we should have postgres handle writing its output somewhere else than stderr (directly into syslog) and collect the postgres log separately.

If we want a copy collected via postgres_exporter this can be done by configuring syslog to fork the data into a 2nd stream besides the network collector.