Open simonarys opened 11 months ago
We met similar issue on a new Intel platform, when we changed to libbpf
based Kepler image, the issue is gone.
Please have a try then. Since the latest Kepler image is by default built with libbpf
yet.
@simonarys please check if the libbpf image fixes this issue. For DRAM power, the current hwmon used by kepler doesn't support DRAM power reporting (https://docs.kernel.org/hwmon/xgene-hwmon.html). We need to support a much newer hwmon (https://docs.kernel.org/hwmon/smpro-hwmon.html) to get DRAM power. But I don't have an Ampere setup right now.
@simonarys btw, if you build libbpf image for arm64, the latest Kepler build and base images from @vimalk78 are based on ubi9, they support multiarch. It will make arm image much easier.
@rootfs Thank you for your response. Unfortunately we weren’t able to build Kepler using the base image from @vimalk78 neither on x86 nor on ARM.
We built the Dockerfile.base
successfully on x86, and for ARM we simply had to replace the line:
RUN yum install -y cpuid
By this line found in your Dockerfile.bcc.base.arm64
RUN yum install -y python3 python3-pip && yum clean all -y && \
pip3 install --no-cache-dir archspec
Because cpuid is not available on ARM.
Next, we build the Dockerfile.libbpf.builder
that installs make, git, gcc, rpm-build, systemd and go.
Finally, we tried to build the Dockerfile
in the build/ folder. However it crashes during this command:
RUN make build SOURCE_GIT_TAG=$SOURCE_GIT_TAG BIN_TIMESTAMP=$BIN_TIMESTAMP
With the following error message:
[Makefile:191: _build_local] Error 2
We also tried building it from your image: quay.io/sustainable_computing_io/kepler_builder:ubi-9-libbpf-1.2.0
but we got the exact same error. Do note that Go wasn’t installed on this image and we had to install it.
We also found out that it builds successfully when using one of your image: quay.io/sustainable_computing_io/kepler_builder:ubi-9-libbpf-1.2.0-go1.18
. Consequently, do you know what step we should take to go from the base image to this builder image that would allow us to build Kepler locally from scratch?
We also tried building it from your image:
quay.io/sustainable_computing_io/kepler_builder:ubi-9-libbpf-1.2.0
but we got the exact same error. Do note that Go wasn’t installed on this image and we had to install it.
$ podman run -it --rm quay.io/sustainable_computing_io/kepler_builder:ubi-9-libbpf-1.2.0 sh
sh-5.1# go version
go version go1.20.10 linux/amd64
i can see golang in builder image
I have been able to build aarch
image for kepler, but that is without CPUID
. though i have not tested it.
Indeed, you're right. Go is installed and the error is the following:
go: cannot find GOROOT directory: /usr/local/go
Thus re-installing Go into the /usr/local/go folder fixed the error for us, sorry for the confusion.
Since Go is already installed, we now had to replace the GOROOT path from usr/local/go
to /lib/golang
on line 11:
ENV GOPATH=/opt/app-root GO111MODULE=off GOROOT=/lib/golang
The path is now found when building the Dockerfile. However we are facing a new issue:
41.67 github.com/sustainable-computing-io/kepler/pkg/manager
42.22 command-line-arguments
44.56 # command-line-arguments
44.56 /lib/golang/pkg/tool/linux_amd64/link: running clang failed: exit status 1
44.56 clang-16: error: no such file or directory: '/usr/lib/x86_64-linux-gnu/libbpf.a'
44.56 clang-16: error: no such file or directory: '/usr/lib/x86_64-linux-gnu/libbpf.a'
44.56 clang-16: error: no such file or directory: '/usr/lib/x86_64-linux-gnu/libbpf.a'
44.56 clang-16: error: no such file or directory: '/usr/lib/x86_64-linux-gnu/libbpf.a'
44.56 clang-16: error: no such file or directory: '/usr/lib/x86_64-linux-gnu/libbpf.a'
44.56 clang-16: error: no such file or directory: '/usr/lib/x86_64-linux-gnu/libbpf.a'
44.56
44.77 make: *** [Makefile:191: _build_local] Error 1
GOROOT
is already defined in the image
sh-5.1# go env | grep ROOT
GOROOT="/usr/lib/golang"
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@vimalk78 , is this issue been fixed?
What happened?
We are interested in running Kepler on an ARM Ampere Altra Max machine (BM). We managed to successfully build the Kepler image from the Dockerfiles available in the build/ folder on the main GitHub branch (hash: 88c82f384f10ba4deb39675b2c88450bc28ee7b8). We then ran the image on a Kubernetes cluster, both on a x86 and the ARM machine. However on the ARM one, we've observed an anomaly in the Grafana dashboard, which is indicating unexpectedly low energy consumption metrics and the "system" namespace is showing unrealistically high power consumptions (more than 1 million W). Moreover, the DRAM energy measurements are always 0. See pictures below.
We would appreciate any insights or guidance on potential ARM-specific optimizations or configurations that might be necessary to ensure accurate energy consumption measurements.
To aid in troubleshooting, we attached logs and configuration details. Please let us know if further information is needed.
What did you expect to happen?
We expected similar results to those obtained when running Kepler on a x86 Intel machine, since we followed the same steps on both architecture to build and deploy Kepler. On the x86 Intel machine we obtained plausible results, not so far from our PDU's power outlet metrics.
How can we reproduce it (as minimally and precisely as possible)?
We had to change a few lines in the Dockerfiles to use the ARM architecture instead of x86 because only the Dockerfile.bcc.base has an ARM version available in the GitHub repo.
We built the following images using the Dockerfiles from the /build folder in this order: 1) bcc.base 2) bcc.builder 3) kernel-source-images 4) bcc.kepler 5) manifest
For bcc.base, we built the dockerfile with an arm64 extension that is already in the GitHub repository.
For bcc.builder, we replaced the FROM to use the bcc.base image we just built and replaced the
amd64
byarm64
in the line 10:For kernel-source-image, we replaced the whole file by this and do not use the
build-kernel-source-images.sh
script:For bcc.kepler, we changed the FROMs of line 1 and 25 to use our previously built images (builder then base) and moved the file to the root of the repository before building it using docker.
For the manifest, firstly, we built the manifest using:
Then, we replaced the image source at line 152 in
_output/generated_manifest/deployment.yaml
with our kepler image built in the previous step and uploaded on DockerHub.Lastly, we deployed the manifest to our empty Kubernetes (Kind) cluster.
Anything else we need to know?
We are using a kind cluster
Kepler pod logs :
Kepler image tag
Kubernetes version
Cloud provider or bare metal
OS version
Install tools
Kepler deployment config
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)