open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.98k stars 1.33k forks source link

Import 'time/tzdata' in service or collector #9991

Open djaglowski opened 1 month ago

djaglowski commented 1 month ago

Is your feature request related to a problem? Please describe. Components sometimes (often indirectly) make use of Go's time.LoadLocation. This depends on an IANA Time Zone Database, which as the documentation for LoadLocation describes, may be provided in various ways.

Importantly, there is at least one mechanism by which one component can affect the behavior of another. See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32506.

Describe the solution you'd like Go 1.15 introduced the time/tzdata package, which can basically serve as a default when finding a database using all other mechanisms is unsuccessful. I think we should import this package at a high level in the collector so that all components which rely on time.LoadLocation have the same default behavior.

Describe alternatives you've considered Import time/tzdata in contrib's timeutils package, which solves the immediate issue but leaves us open to future recurrences of the same problem depending on component implementations.

TylerHelmuth commented 1 month ago

This also came up in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32479

evan-bradley commented 1 month ago

I would be inclined to load this in something like the timeutils package that we publish from core and require to be used in all components that want to use time.LoadLocation similar to some of our other component requirements. This would allow distributions that don't include components that do timezone parsing to forgo the ~400KB overhead of including the database.

If we don't want that level of granularity, I would suggest putting this inside the builder and offering a setting to disable it with a nice warning about knowing what you're doing by toggling the setting.

dyladan commented 1 month ago

As I see it, there are several ways to solve this:

  1. Require time/tzdata somewhere central. Bloat every collector by 400k and never see this problem again.
  2. Add time zone database to published docker images. Right now the distributions are put in a scratch docker image with no TZ database and no access to the host db. time.LoadLocation would load from the docker image's DB. Binaries used directly would load from the host. This is the same way certs are handled (https://github.com/open-telemetry/opentelemetry-collector-releases/blob/main/distributions/otelcol-contrib/Dockerfile#L9). Ideally this would only be done on distributions that actually need timezone info. Log a warning when the tz database is not found (binary run on a host with no db).
  3. Document that each component using the tz database should require time/tzdata. This is similar to (1) but means distributions not using those components would avoid the 400kb binary size hit. It introduces the operational overhead of ensuring those components actually do import the database.
  4. Document that anyone using timezones in their configuration should ensure the host has a timezone db, and should mount the db into docker images if it is required. This requires the most manual effort from the user, but provides maximum flexibility in the case the timezone database is updated.

Personally, I think (2) is the best option here. The timezone database is frequently updated and it should be kept separate from the binary for that reason. In cases where a system database exists (most of the time outside docker), compiling a 400kb database into the binary is wasted space that will never be used.

dyladan commented 1 month ago

I would be inclined to load this in something like the timeutils package that we publish from core and require to be used in all components that want to use time.LoadLocation similar to some of our other component requirements.

This is similar to a combination (3) but I think introduces an unnecessary level of indirection. Components can just as easily import tzdata as they can use your timeutils package. The only way to enforce it would be to ban use of the time module which I think is likely not acceptable.

I also thought of a 5th option (sort of a combo of 2 and 3) which is to document this issue and leave it to component owners to make their own decision if they want their component to be the one that bloats the binary size or if they want to depend on the host. This would also likely require the published docker images to have the tz database though.

edit: I still like option (2) the best

dyladan commented 1 month ago

The conversation linked in the OP states that this is an issue on windows. I added this comment there, but I think it is also relevant to this conversation so I'll paste it here:

should work ok on linux, maybe on macos, but windows would be problematic for sure.

If this is an issue for windows only it can also be done as a build flag in the release pipeline. Windows builds could include it by adding the -tags=timetzdata flag.


This could be combined with solution (2) above to solve it everywhere with minimal impact to binary sizes while still ensuring tz data is available for our users.

djaglowski commented 1 month ago
  1. Require time/tzdata somewhere central. Bloat every collector by 400k and never see this problem again.

Fair point on the binary size. I agree distros should have a choice.

  1. Add time zone database to published docker images. Right now the distributions are put in a scratch docker image with no TZ database and no access to the host db. time.LoadLocation would load from the docker image's DB.

I don't think is does anything for to the original issue I mentioned, where a custom build of the collector is running into strange behaviors because one component is loading the database on its own. I think any solution or recommendation we come up with should be geared towards finding the database consistently regardless of which components are included.

I would be inclined to load this in something like the timeutils package that we publish from core and require to be used in all components that want to use time.LoadLocation similar to some of our other component requirements.

This is similar to a combination (3) but I think introduces an unnecessary level of indirection. Components can just as easily import tzdata as they can use your timeutils package. The only way to enforce it would be to ban use of the time module which I think is likely not acceptable.

I should clarify that my suggestion to import time/tzdata in timeutils was essentially just a way to implement (3) for all the known components in contrib which are currently not doing so. You're right it doesn't guarantee anything but if we agree on (3) then adding it here makes more sense than adding it separately in several components.

I also thought of a 5th option (sort of a combo of 2 and 3) which is to document this issue and leave it to component owners to make their own decision if they want their component to be the one that bloats the binary size or if they want to depend on the host.

The problem here again is that it's not just a matter of binary size, but that if a user adds/removes a component to their builder manifest, it should not change the behavior of another component.

  1. Document that anyone using timezones in their configuration should ensure the host has a timezone db, and should mount the db into docker images if it is required. This requires the most manual effort from the user, but provides maximum flexibility in the case the timezone database is updated.

This is pretty user unfriendly but I think it's compatible with @evan-bradley's suggestion to provide an option in the builder manifest. Essentially, we would make a best effort to give users full control, while also giving them an easy way to build in a default if they want it. Additionally, I think any approach we choose should be prescriptive to component authors that they SHOULD NOT import time/tzdata, even if we can't enforce it.

dyladan commented 1 month ago

I don't think is does anything for to the original issue I mentioned, where a custom build of the collector is running into strange behaviors because one component is loading the database on its own. I think any solution or recommendation we come up with should be geared towards finding the database consistently regardless of which components are included.

Sorry if it wasn't obvious but I would say in the case of (2) we would prescribe that components don't import the DB. I think it's a documentation issue. I see it as very similar to the certs. We don't compile those in for very similar reasons, even though it would be theoretically possible.


I think if it was my choice this is what I would do:

  1. Disallow importing time/tzdata possibly enforced with a linter or similar.
  2. Add an option to the builder manifest that imports time/tzdata. This would allow the user to compile into the binary if they wanted.
  3. In OCB after the code is generated, scan for uses of time.LoadLocation. If a use is found, add a piece of code to the startup which tries to load a location and logs a warning if it fails. Warning links to a documentation page explaining the issue and the user's options.
  4. In any published docker images which use one of these components, add the timezone database.
  5. Use the -tags=timetzdata for windows builds in goreleaser config in releases repo.

This would accomplish:

djaglowski commented 1 month ago

I think if it was my choice this is what I would do:

  1. Disallow importing time/tzdata possibly enforced with a linter or similar.
  2. Add an option to the builder manifest that imports time/tzdata. This would allow the user to compile into the binary if they wanted.
  3. In OCB after the code is generated, scan for uses of time.LoadLocation. If a use is found, add a piece of code to the startup which tries to load a location and logs a warning if it fails. Warning links to a documentation page explaining the issue and the user's options.
  4. In any published docker images which use one of these components, add the timezone database.
  5. Use the -tags=timetzdata for windows builds in goreleaser config in releases repo.

This all seems reasonable to me except (3). Adding an entire layer to the build process which attempts to analyze code and generate additional code based on findings sounds fragile and burdensome. Easy to find documentation with expected error messages and remediation options sounds reasonable though.

michalpristas commented 1 month ago

i think i'm aligned with Dan here. the question i have, do we want users to opt-in or rather opt-out. Opt-in leaves out garbage you don't need, until maybe you change pipeline config and suddenly you start needing it and you will need to redeploy whole stack. If it's thousands of agents you're managing this can be not that great default behavior.

Opt-out. This makes sure that we're functioning well on all systems by default but we give users and option to say, I really don't need this and I want to spare some bytes.

From usability perspective opt-out makes more sense to me.

dyladan commented 1 month ago

I'm fine with either opt-in or out. If the tz data is included in the build but present on the host, it will just be ignored. A bit wasteful, but I think an ok tradeoff for being sure the default config works in as many places as possible. It is just one more collector "tuning" step that a user may do if they want.

evan-bradley commented 1 month ago

Just as another point to consider, the time/tzdata docs suggest including it in an application's main package instead of in a library.

evan-bradley commented 1 month ago

It sounds like we're all in agreement on putting it in the builder to make it easily accessible to users. I'm going to suggest including it as an opt-out flag in the builder since it's only a .5% increase in binary size, looking at a recent core binary.

I'd have consensus before possibly adding ~400 KB to most Collector binaries. cc @open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers

:+1: Include an enabled by default option in the builder to pass the -tags=timetzdata flag to go build. :eyes: Include a disabled by default option in the builder pass the -tags=timetzdata flag to go build. :rocket: Don't add an option to the builder. Document this situation, or include time/tzdata elsewhere (please leave a comment).

jpkrohling commented 1 month ago

I also prefer to have the database on the container image.

I don't think is does anything for to the original issue I mentioned, where a custom build of the collector is running into strange behaviors because one component is loading the database on its own

Our container images should be seen as a recipe for what to include in distributions, and downstream/custom distributions should mirror that, similar to what we do with the CAs used with TLS.

mx-psi commented 1 month ago

Is there an easy way to include it on the official image? And to keep it updated over time? (Maybe an alpine package that we can copy files from?)

TylerHelmuth commented 1 month ago

I don't think only including it in the image is sufficient as users can use the collector we release in other formats. If we don't put it in ocb, it needs to be in all the artifacts we release, not only the docker image.

mx-psi commented 1 month ago

I don't think only including it in the image is sufficient as users can use the collector we release in other formats. If we don't put it in ocb, it needs to be in all the artifacts we release, not only the docker image.

I think this was mentioned above: on Unix systems at least tzdata is provided by the operating system, and Go will pick it up automatically, so it's not a problem. On Windows we don't (yet?) release a package per-se, that would be the only place where this would be a problem.


I see tzdata as an external dependency, similar to the CA certificate bundle. It has a lot of similarities (updated relatively infrequently, commonly used, usually provided by the operating system...). We have other components that need external dependencies for which our stance is document (e.g. open-telemetry/opentelemetry-collector-releases/issues/462, Docker components where you need to mount the Docker socket, JMX receiver...).

Virtually every single component uses CA certificate bundle, so we add it to the container by default, and depend on the operating system for it in other cases. If we see this as something as common/essential as the certificate bundle, then we should do something similar: add it to the Docker image. I don't see how the situation is different from this prior case, and why we should do something different.

jpkrohling commented 1 month ago

I think this was mentioned above: on Unix systems at least tzdata is provided by the operating system, and Go will pick it up automatically,

That's my thinking as well, and why I think including the database on the container images should be sufficient.

I don't see how the situation is different from this prior case, and why we should do something different.

+1

michalpristas commented 1 month ago

just a small note, from my experience, claims like On Windows we don't (yet?) ... just delay the time when we have one and create a technical debt.

from end user perspective (and this is a question as it's not clear to me), when building a collector of my own or using binary that was released how do I know there's something wrong before it's misbehaving on test in better case or prod (because i haven't caught any error in test).

mx-psi commented 1 month ago

just a small note, from my experience, claims like On Windows we don't (yet?) ... just delay the time when we have one and create a technical debt.

If you are interested on an upstream Windows MSI package, we have open-telemetry/opentelemetry-collector-releases/issues/157 to track this. I should have linked to this issue explicitly on my message, sorry. Once we come to a conclusion on tzdata, if the outcome is to have tzdata included on the packages, I can add a note to the collector-releases issue so that we ensure to provide tzdata/solve this problem in some other way on Windows. Not sure if this fully addresses your concern, let me know if not.

how do I know there's something wrong before it's misbehaving on test in better case or prod (because i haven't caught any error in test).

Ideally every component:

  1. documents any special requirements like this on their README
  2. does its best to error out (or warn) as quickly as possible if said requirements are not met
  3. does validation that you can run with the validate subcommand

Sometimes it's hard to detect if the requirements are not met (e.g. impossible to determine if it's a temporary or permanent error), and sometimes validation is hard to do without actually running the component, but I think this is a summary of what we have historically strived for.

dyladan commented 1 month ago

just a small note, from my experience, claims like On Windows we don't (yet?) ... just delay the time when we have one and create a technical debt.

It is a simple matter to build it into windows binaries using goreleaser with a flag though without affecting the linux/mac builds.


I'm still of the opinion that it belongs as a host package for all reasons already mentioned. tzdata package is available in alpine registry https://pkgs.alpinelinux.org/package/edge/main/x86_64/tzdata and can easily be copied over in the same way that certs can for docker images:

FROM alpine:3.19 as certs
RUN apk --update add ca-certificates

FROM alpine:3.19 as zones
RUN apk --update add tzdata

FROM scratch

COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=zones /usr/share/zoneinfo /usr/share/zoneinfo
evan-bradley commented 1 month ago

I would be okay with keeping the database inside container images and not baking it into the binary by default, but if we take that route, could we offer tooling to handle this, through the builder or elsewhere?

Our container images should be seen as a recipe for what to include in distributions, and downstream/custom distributions should mirror that, similar to what we do with the CAs used with TLS.

I agree with what @jpkrohling said here, but I think the process of going from an OCB-generated binary to an image isn't straightforward to someone who just wants simple defaults. Our goreleaser config for our Kubernetes distribution, for example, includes a lot of flags, architectures, etc. that may not be useful to a standard user. I would say the same for our Docker image: it's not immediately obvious how much of this is a situation where upstream needs to cover edge cases or whether it's good for me as an end-user to also build into my distro.

Overall, I'd like to make it obvious when to include these files in custom distributions. Could OCB output optional Docker and Goreleaser files? Or do we expect users will want to deeply customize these and documentation on our build configuration files will suffice?

jpkrohling commented 1 month ago

It could output a Dockerfile, but not sure about goreleaser templates. While also possible, I'm not sure we'd get much value from it.

evan-bradley commented 1 month ago

Discussion from the 2024-04-24 SIG meeting: