open-telemetry / opentelemetry-network

eBPF Collector
https://opentelemetry.io
Apache License 2.0
296 stars 46 forks source link

Enable domain names over 80 characters long #256

Open kubakukis14 opened 8 months ago

kubakukis14 commented 8 months ago

Is your feature request related to a problem? Please describe.

Hello there! Is there any reason that host workload names seem to be truncated to 80 characters? I've been getting truncated host names on some metrics (from the left), which is pretty frustrating.

example: long-example-name-k8s-collector-network-k8s-reducer.example-na-01.svc.cluster.local becomes: g-example-name-k8s-collector-network-k8s-reducer.example-na-01.svc.cluster.local

Describe the solution you'd like

Getting full length host names on metrics if possible.

Describe alternatives you've considered

No response

Additional context

No response

yonch commented 8 months ago

Thank you for reporting this!

To avoid the overhead of new/free, the collectors and reducer pre-allocate all memory. Therefore we had to choose the length of allocated strings. The domain name selection seems to have been too low. Do you know what length you'd need?

Enlarging URLs might require changing lengths in the collector and reducer, perhaps in multiple locations (ingress and aggregation cores), so relatively easy but not trivial.

Happy to review a PR if you have bandwidth to change this. Otherwise this can be a first PR.

kubakukis14 commented 7 months ago

Thanks for responding and explaining the problem! I'd like to help, so you can expect a PR from me in the near future if possible. I've got a few questions.

Is there a way to get the build running on an arm64 machine? I've run into some issues when trying to build the build environment container, and the image I pulled from quay.io/splunko11ytest/network-explorer-debug/build-env had some issue when running make commands.

As for the length, I haven't personally ran into domain names over 100 characters long, though wouldn't it be for the best to stick with the standard 256 bytes?

yonch commented 7 months ago

Would love to review a patch to fix this for you (and everyone else encountering this).

I haven't built the build-env on arm64. It would be great to have it supported. I think we might want to track it as a separate Issue, some folks wanted to pick up a helpful non deep-dive issue. Also I'm happy to brainstorm if you encounter specific issue to work through them.

The "role" dimensions (Domains / Deployment / CronJob names etc.) are statically allocated to avoid allocation/deallocation overhead (also to alleviate thread ownership questions on copies) This has the disadvantage of occupying the maximum length even when not used. This motivated us to "skimp" on length to reduce wasted memory. They should definitely be more than 80. We could go with 128 or 256 maybe (I have no strong opinion).

kubakukis14 commented 7 months ago

Alright, a pull request should be up in a few minutes, we can move the conversation there.

Here are some of my notes about setting up the build env though: About building the build-env on arm, I've encountered issues with the -m64 gcc flag, which the C compiler on my mac doesn't support. There may be some workaround for this, but I thankfully had a Fedora system on hand and built it there. Inside the env I had a problem building stuff with ./build.sh --clean. It turns out some task failed (:io.opentelemetry.render:generateXtext) because some files (org/eclipse/core/runtime/OperationCanceledException) were built on a higher version of Java (17) than the version inside the container (12) so it wouldn't work together (maybe because the built-env itself was built on java 17?). Updating Java then broke gradle (that comes in a zip file inside the container). Downloading a newer Gradle version (7.3.3.), along with replacing deprecated keywords in gradle.build files ('compile' -> 'implementation') resolved the problem for me.

yonch commented 7 months ago

Strange the build env is not working correctly, that's the reason we used containers for the build environment. Is this something that could be fixed with a change in opentelemetry-network-build-tools?

kubakukis14 commented 7 months ago

Strange the build env is not working correctly, that's the reason we used containers for the build environment. Is this something that could be fixed with a change in opentelemetry-network-build-tools?

I think a change to the build tool repo could definitely help. Actually, in the build pipeline you launched on the branch for the PR I made (thank you for that btw 😃) you can see the exact error I got when building the reducer. It is related to 'Task :io.opentelemetry.render:generateXtextLanguage', I assume some external java library that the reducer needs was updated and newly built on Java 17, while the Java in the container is Java 11.

Bumping up the Java version (and by extension Gradle) helped in my case, but I'm not sure if won't break something else.

kubakukis14 commented 6 months ago

Could I get an update on this? https://github.com/open-telemetry/opentelemetry-network/pull/260#issuecomment-2079633656

yonch commented 6 months ago

I think a change to the build tool repo could definitely help. Actually, in the build pipeline you launched on the branch for the PR I made (thank you for that btw 😃) you can see the exact error I got when building the reducer. It is related to 'Task :io.opentelemetry.render:generateXtextLanguage', I assume some external java library that the reducer needs was updated and newly built on Java 17, while the Java in the container is Java 11.

Bumping up the Java version (and by extension Gradle) helped in my case, but I'm not sure if won't break something else.

@kubakukis14 @jakub-racek-swi if you have cycles, would you be open to opening an issue or PR on the build-tools repo?

btw I think this slack message is referring to the same issue?

yonch commented 6 months ago

btw I think this slack message is referring to the same issue?

for reference, also this thread on #262

jakub-racek-swi commented 4 months ago

@kubakukis14 @jakub-racek-swi if you have cycles, would you be open to opening an issue or PR on the build-tools repo?

@yonch Sorry about the delay, is this issue still persisting? I could make a PR with the changes that helped me resolve it, if it is still relevant.

And yes, the issue mentioned in slack is the same one.

ganeshardlkar commented 4 months ago

Hey @jakub-racek-swi I just checked your forked repo. The changes are not reflected for the problem above.

yonch commented 2 months ago

@jakub-racek-swi can you make a PR on the build-tools repo? Now that we merged the DNS patch, we're blocked on the build repo to release. I'm asking since you probably have a fixed build repo somewhere hopefully..