Closed ribalba closed 11 months ago
gave it a read and I really dig the style.
here are my edits:
docker context use rootless
nowI would also like you to add a small paragraph to the blog article that clarifies that we are publicizing this stuff also for the main reason that it should be open-source and usable by everybody and not be something that we keep as an asset. Main reason also that we want to make it available for instance for the use in something like a measurement cluster like the Blue Angel for Software which aims to be open
@ribalba quick update on Ubuntu 22.04 as I am installing.
The timers are this time actually all there, so there seems to be a different set in 22.10 apparently.
I also spotted the problem we had when installing the GMT through this script:
apt remove
command fails if the packages do not exist. I opted for dpkg --remove
which can ignore non installedClassic problem when duplicating code. The actual code I used last is in the PR https://github.com/green-coding-berlin/green-metrics-tool/pull/210/files
I decided to actually only have the blog article as Single Point of Truth and remove the code from the GitHub code as we will not maintain it on GitHub and it will change over time anyway. Once we install more machines more frequently we should think about a repo to hold all the scripts.
can this be closed? Is this referenced from anywhere?
Blog article is here: https://www.green-coding.berlin/blog/nop-linux/
At Green Coding Berlin (GCB), one goal is to enable reproducible runs on our cluster. An important step towards accurate measurements was the creation of NOP Linux, our custom Linux distro that disables as many background processes as possible to avoid interruptions during measurements. Another crucial step was ensuring the reliable operation of the PowerSpy2, so we could measure the entire power consumption.
We wanted to create a cluster that allowed users to select the server on which they'd like to run the benchmark. Initially, we aimed for full automation and looked at the excellent tool from Canonical, MAAS. As we use Ubuntu as our reference system, this seemed to be the logical choice. Although the tool was impressive, it required a daemon running on the machine, which created multiple interruptions during our measurements. This led us to reevaluate our tooling, and we decided to try a simpler approach using PXE. While there is a great description [1], and the general flow worked very well, we invested a significant amount of time and effort in configuring the machines correctly. Getting the entire installation flow working with reboots, different configurations like PowerSpy, and the multitude of different servers we wanted to use presented a considerable overhead. Additionally, we have our machines distributed across various data centers, and we needed to set up a complex networking layer for the DHCP discovery to work. While this was a scalable solution, it required substantial overhead that had to be maintained. Moreover, our tool develops quite rapidly, so we would have to keep updating the installation process. As a small company, this was not feasible in our scenario. Consequently, we decided to sacrifice scalability in favor of simplicity. In the meantime, we had built a complex test setup with various servers and a complicated setup that we could now disassemble. The main lesson learned for the future is to start with the simplest solution that solves the problem and continually reevaluate your assumptions and needs.
We are aware that there are a multitude of configuration systems out there that don't require a client running on the machine to be configured and that automate some of the tasks we will now do manually. But we decided to keep it very simple for now and not invest more time into another solution.
The system we are using now
As previously mentioned, the current system will not scale to accommodate thousands of machines, but it will suffice for a considerable amount of time in our situation. All scripts and source files can be found in the
tools/cluster
directory in the green metrics tool. These will update as the tool develops. The files shown in this arcticle might already be outdated when you read it as we will not update the article! :)We have now opted for quite a simple solution. You will need a server that exposes the database externally and all results will be written to this server. We then have a
client.py
script that runs on every server that periodically queries the server for jobs and if so executes the measurement undisturbed. After a job is finished the client does some cleanup tasks and checks if there is an update for the GMT and also for the operating system. It then retries to get a job till there are no more jobs left on which the client sleeps for 5 minutes and retries. On every wake up we send a message to the server that the client is up and functional. So we can check server side that all clients are up and working.To create your own GCB cluster, you can follow these steps:
1) Install ubuntu. Preferrably 22.04 LTS, but newer versions should also work. Older versions are discouraged
Use this cloud config file to install your client machine:
You can use the descriptions how to create a custom iso Using another volume to provide the autoinstall config . We then put the iso on a usb stick and boot the machine by hand. As we have physical access this is ok for now. In this example the password is ubuntu. Obviously change this in your case. Once the install has finished you can pull the usb stick and reboot.
2) Install NOP Linux
You can now ssh into the machine and start configuring. This is mainly done by copy pasting scripts manually. As we don't install machines that often this is totally ok for now. Please note that all these commands need to be run as root. So you ssh into the machine with the
gc
user and then need to become root by usingsudo su
Now you should have a machine that only runs a minimal amount of services and hence should not create a significant amount of interrupts that disturb measurements.
3) Install the Green Coding Tool
Now we need the tooling installed on the client to start the measurements.
This might also change, please refer to the GMT Documentation.
If you want to use the PowerSpy2 device please follow the installation under https://docs.green-coding.berlin/docs/measuring/metric-providers/psu-energy-ac-powerspy2/
4) Configure the GMT
Now that you installed the GMT you need to configure it to run in client mode. You can run the install script with the following parameters to give you the first version of the config file. You will need to change the api/ metrics endpoint to the url of your server.
Now please also edit the following points in the
config.yml
:postgresql
section so that thehost
points to your server. You will need to replace thegreen-coding-postgres-container
value. This should be the same url as you specified when running the install script. Check that the password is correct.machine_id
to the number you gave the client when adding it to themachines
table on the server.In this setup we only have one machine configured to send emails, the server. You can add email sending capabilities to any client if you want by adding the
smtp
data in the configuration. Don't forget to also set the admin values (email
andno_emails=False
) at the end of the file. Also you will need to set up a cron job for this. Please see the GMT documentation for details.5) Add the cleanup script to the sudoers file
6) Start the client service
To make sure that the client is always running you can create a service that will start at boot and keep running.
Create a file under:
/etc/systemd/system/green-coding-client-service.service
with following contentYou should now see the client reporting it's status on the server.
[1] Setup PXE Boot Server using cloud-init for Ubuntu 20.04