Open kubakukis14 opened 8 months ago
Thank you for reporting this!
To avoid the overhead of new/free, the collectors and reducer pre-allocate all memory. Therefore we had to choose the length of allocated strings. The domain name selection seems to have been too low. Do you know what length you'd need?
Enlarging URLs might require changing lengths in the collector and reducer, perhaps in multiple locations (ingress and aggregation cores), so relatively easy but not trivial.
Happy to review a PR if you have bandwidth to change this. Otherwise this can be a first PR.
Thanks for responding and explaining the problem! I'd like to help, so you can expect a PR from me in the near future if possible. I've got a few questions.
Is there a way to get the build running on an arm64 machine? I've run into some issues when trying to build the build environment container, and the image I pulled from quay.io/splunko11ytest/network-explorer-debug/build-env
had some issue when running make
commands.
As for the length, I haven't personally ran into domain names over 100 characters long, though wouldn't it be for the best to stick with the standard 256 bytes?
Would love to review a patch to fix this for you (and everyone else encountering this).
I haven't built the build-env on arm64. It would be great to have it supported. I think we might want to track it as a separate Issue, some folks wanted to pick up a helpful non deep-dive issue. Also I'm happy to brainstorm if you encounter specific issue to work through them.
The "role" dimensions (Domains / Deployment / CronJob names etc.) are statically allocated to avoid allocation/deallocation overhead (also to alleviate thread ownership questions on copies) This has the disadvantage of occupying the maximum length even when not used. This motivated us to "skimp" on length to reduce wasted memory. They should definitely be more than 80. We could go with 128 or 256 maybe (I have no strong opinion).
Alright, a pull request should be up in a few minutes, we can move the conversation there.
Here are some of my notes about setting up the build env though:
About building the build-env on arm, I've encountered issues with the -m64
gcc flag, which the C compiler on my mac doesn't support. There may be some workaround for this, but I thankfully had a Fedora system on hand and built it there. Inside the env I had a problem building stuff with ./build.sh --clean
. It turns out some task failed (:io.opentelemetry.render:generateXtext) because some files (org/eclipse/core/runtime/OperationCanceledException) were built on a higher version of Java (17) than the version inside the container (12) so it wouldn't work together (maybe because the built-env itself was built on java 17?). Updating Java then broke gradle (that comes in a zip file inside the container). Downloading a newer Gradle version (7.3.3.), along with replacing deprecated keywords in gradle.build
files ('compile' -> 'implementation') resolved the problem for me.
Strange the build env is not working correctly, that's the reason we used containers for the build environment. Is this something that could be fixed with a change in opentelemetry-network-build-tools?
Strange the build env is not working correctly, that's the reason we used containers for the build environment. Is this something that could be fixed with a change in opentelemetry-network-build-tools?
I think a change to the build tool repo could definitely help. Actually, in the build pipeline you launched on the branch for the PR I made (thank you for that btw 😃) you can see the exact error I got when building the reducer. It is related to 'Task :io.opentelemetry.render:generateXtextLanguage', I assume some external java library that the reducer needs was updated and newly built on Java 17, while the Java in the container is Java 11.
Bumping up the Java version (and by extension Gradle) helped in my case, but I'm not sure if won't break something else.
Could I get an update on this? https://github.com/open-telemetry/opentelemetry-network/pull/260#issuecomment-2079633656
I think a change to the build tool repo could definitely help. Actually, in the build pipeline you launched on the branch for the PR I made (thank you for that btw 😃) you can see the exact error I got when building the reducer. It is related to 'Task :io.opentelemetry.render:generateXtextLanguage', I assume some external java library that the reducer needs was updated and newly built on Java 17, while the Java in the container is Java 11.
Bumping up the Java version (and by extension Gradle) helped in my case, but I'm not sure if won't break something else.
@kubakukis14 @jakub-racek-swi if you have cycles, would you be open to opening an issue or PR on the build-tools repo?
btw I think this slack message is referring to the same issue?
btw I think this slack message is referring to the same issue?
for reference, also this thread on #262
@kubakukis14 @jakub-racek-swi if you have cycles, would you be open to opening an issue or PR on the build-tools repo?
@yonch Sorry about the delay, is this issue still persisting? I could make a PR with the changes that helped me resolve it, if it is still relevant.
And yes, the issue mentioned in slack is the same one.
Hey @jakub-racek-swi I just checked your forked repo. The changes are not reflected for the problem above.
@jakub-racek-swi can you make a PR on the build-tools repo? Now that we merged the DNS patch, we're blocked on the build repo to release. I'm asking since you probably have a fixed build repo somewhere hopefully..
Is your feature request related to a problem? Please describe.
Hello there! Is there any reason that host workload names seem to be truncated to 80 characters? I've been getting truncated host names on some metrics (from the left), which is pretty frustrating.
example:
long-example-name-k8s-collector-network-k8s-reducer.example-na-01.svc.cluster.local
becomes:g-example-name-k8s-collector-network-k8s-reducer.example-na-01.svc.cluster.local
Describe the solution you'd like
Getting full length host names on metrics if possible.
Describe alternatives you've considered
No response
Additional context
No response