nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.63k stars 137 forks source link

Syslog Klib issues #1671

Open jasonrichardsmith opened 2 years ago

jasonrichardsmith commented 2 years ago

Hi, I have been working on shipping logs with syslog -> fluentbit -> cloudwatch. It works with a few issues.

1) Hardcoding an IP for Syslog can be a little problematic if you lose an AZ in AWS. If that happened, new app instances would have to be built with a new IP.

The Klib supports DNS, but only if the log server is resolvable at the moment of boot. If it cannot resolve it appears the machine continues to boot but will not try to send any logs even if the logging server is eventually resolvable over DNS. The application will continue to run, but no logs will be shipped.

If a logging server is running and logs are being shipped and then that logging server is replaced. It does not appear that the KLibs tries to resolve a new logger address. A machine reboot resolves this issue, sometimes

Essentially log server discovery only appears to work on first instance boot/reboot, and it never auto recovers when the servers change.

2) It does not appear DNS Server IPs respects /etc/resolv.conf but instead defaults to the DNS servers provided via DHCP. I cannot get any DNS resolution to work from resolv.conf unless I setup explicit DHCPOptions for my VPC in AWS. (I am using consul for service discovery.)

3.) ident is populated with pkg name when the image is built from a nanovms package

francescolavra commented 2 years ago

The Klib supports DNS, but only if the log server is resolvable at the moment of boot. If it cannot resolve it appears the machine continues to boot but will not try to send any logs even if the logging server is eventually resolvable over DNS. The application will continue to run, but no logs will be shipped.

The syslog klib uses an exponential backoff when retrying DNS resolution after a failure, so if the server has been unreachable for a long time it may take a long time before the klib re-sends a DNS request; of course this can be improved, by limiting the maximum backoff time.

ident is populated with pkg name when the image is built from a nanovms package

ident is populated with the name of the running program, which in case of images built from a package usually coincides with the name of the executable file included in that package (e.g. "python_3.8.6/python3"). If you have any suggestions on how this should be changed, we'd happy to hear them.

jasonrichardsmith commented 2 years ago

image name would be good. The problem with executable is that if I have several python apps, identifying the actual app requires grepping for patterns in the logs. If the ident had the image name, I would identify the app, and the machine. For example this is my output stanza for fluentbit.

[OUTPUT]
    Name cloudwatch
    Match   *
    region eu-west-1
    auto_create_group true
    log_group_name fluent-bit-cloudwatch-$(uuid)
    log_stream_name fluent-bit-$(ident)-$(host)
jasonrichardsmith commented 2 years ago

Also, is DNS resolution tried again if the IP address for loggers change?

francescolavra commented 2 years ago

No, at the moment the syslog klib does not re-send DNS requests after successfully resolving a server name, so it does not act upon an IP address change until after a reboot

francescolavra commented 2 years ago

https://github.com/nanovms/nanos/pull/1674. A change to Ops to auto-populate the environment variable containing the image name (which is used by the syslog klib to identify the application) is at https://github.com/nanovms/ops/pull/1261; alternatively, you can just specify manually any image name by adding "Env": {"IMAGE_NAME": "my_name"} to config.json.

eyberg commented 2 years ago

https://github.com/nanovms/nanos/pull/1779