green-coding-solutions / green-metrics-tool

Measure energy and carbon consumption of software
https://metrics.green-coding.berlin
GNU Affero General Public License v3.0
141 stars 19 forks source link

Power per container #795

Closed ArneTR closed 1 month ago

ArneTR commented 1 month ago

This PR brings a power estimation feature per container to the GMT.

The approach is similar in it's idea on how Scaphandre does it:

@mrchrisadams - Is that what you were looking for?

@davidkopp - Also would love your feedback on this as we have talked about this before.

A bit more context: Initially the philosphy of the GMT is not to have these values in as they provide limited insights. We have written are more detailed piece on this here:

Since some of you know we have been funded by Mozilla to make a kernel Energy Plugin - https://github.com/orgs/green-kernel/discussions/1 - Say hi in the thread if you like!

Since the plugin is coming soon we wanted to have some comparative functionality already in GMT to see how heaviliy the values will deviate. Still the given caveats apply. We will also add some documenation on this soon.

Love your feedback on this!

Demo measurement

mrchrisadams commented 1 month ago

Wow @ArneTR that was fast - we only discussed it a few days ago! I'll look into this in detail tomorrow

ArneTR commented 1 month ago

just to give a peek how jittery these values are:

Although there is some network-IO making everything a bit unpredictable in the first place the total machine energy only deviates by less than 1%

Screenshot 2024-05-31 at 12 02 49โ€ฏPM

The containers get up to 30% though!

Screenshot 2024-05-31 at 12 02 57โ€ฏPM

Reason for that being is that the machine to be used is a represeantative for a user system and thus has DVFS, TurboBoost, HyperThreading turned on. This leads to CPU utilization becoming quite flaky.

See also our case study here: https://www.green-coding.io/case-studies/cpu-utilization-usefulness/

github-actions[bot] commented 1 month ago
Eco-CI Output: Label ๐Ÿ–ฅ avg. CPU utilization [%] ๐Ÿ”‹ Total Energy [Joules] ๐Ÿ”Œ avg. Power [Watts] Duration [Seconds]
Total Run 22.4451 1515.15 3.45924 446
Measurement #1 22.6363 1515.15 3.45924 438

๐Ÿ“ˆ Energy graph:


 8.18 โ”ค                                                                                                                                 โ•ญโ”€โ”€โ”€โ•ฎ
 7.54 โ”ค                                                                                                                                 โ”‚   โ”‚
 6.90 โ”ค                                                                                                                               โ•ญโ•ฎโ”‚   โ”‚
 6.26 โ”ค                                                                                                               โ•ญโ•ฎ        โ•ญโ•ฎ    โ”‚โ”‚โ”‚   โ”‚
 5.62 โ”ค                                                                                                               โ”‚โ”‚       โ•ญโ•ฏโ•ฐโ•ฎ   โ”‚โ•ฐโ•ฏ   โ”‚
 4.97 โ”ค                                                                         โ•ญโ•ฎ                                    โ”‚โ•ฐโ•ฎ   โ•ญโ•ฎ โ”‚  โ•ฐโ•ฎ  โ”‚     โ”‚
 4.33 โ”ค                    โ•ญโ”€โ•ฎ                                                  โ”‚โ”‚ โ•ญโ•ฎ                      โ•ญโ•ฎ         โ”‚ โ”‚โ•ญโ”€โ”€โ•ฏโ”‚ โ”‚   โ”‚  โ”‚     โ”‚     โ•ญโ•ฎ    โ•ญโ•ฎ                                                                                                  โ•ญโ•ฎ  โ•ญโ”€โ”€โ”€โ”€โ”€โ•ฎ                                                                                                                                                                         โ•ญโ•ฎ         โ•ญ
 3.69 โ”ค    โ•ญโ”€โ•ฎโ•ญโ”€โ”€โ”€โ•ฎโ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚โ•ฐโ”€โ•ฏโ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏโ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ•ฏ   โ•ฐโ”€โ•ฏ   โ•ฐโ•ฎโ•ญโ•ฏ     โ”‚     โ”‚โ•ฐโ”€โ”€โ”€โ”€โ•ฏโ•ฐโ”€โ”€โ•ฎ        โ•ญโ”€โ•ฎ         โ•ญโ•ฎ โ•ญโ”€โ”€โ”€โ•ฎ         โ•ญโ•ฎ         โ•ญโ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏโ•ฐโ”€โ”€โ•ฏ     โ•ฐโ”€โ”€โ•ฎ         โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ       โ•ญโ”€โ”€โ•ฎ       โ•ญโ”€โ”€โ•ฎ         โ•ญโ”€โ”€โ”€โ”€โ”€โ•ฎ  โ•ญโ”€โ•ฎ         โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏโ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
 3.05 โ”ค    โ”‚ โ”‚โ”‚   โ”‚โ”‚                                                          โ•ฐโ”€โ•ฏ                                                   โ”‚โ”‚      โ”‚    โ•ญโ•ฏ         โ”‚        โ”‚ โ•ฐโ•ฎ        โ”‚โ”‚ โ”‚   โ•ฐโ•ฎ        โ”‚โ”‚         โ”‚โ”‚ โ”‚                                                        โ”‚         โ”‚          โ”‚       โ”‚  โ”‚       โ”‚  โ”‚         โ”‚     โ”‚ โ•ญโ•ฏ โ”‚         โ”‚                                                  โ”‚ โ”‚
 2.41 โ”ค    โ”‚ โ”‚โ”‚   โ”‚โ”‚                                                                                                                โ”‚โ”‚      โ”‚    โ”‚          โ•ฐโ•ฎ       โ”‚  โ”‚        โ”‚โ”‚ โ”‚    โ”‚       โ•ญโ•ฏโ”‚โ•ญโ•ฎ       โ”‚โ•ฐโ•ฎโ”‚                                                        โ”‚         โ”‚          โ”‚       โ”‚  โ”‚       โ”‚  โ”‚         โ”‚     โ”‚ โ”‚  โ”‚         โ”‚                                                  โ•ฐโ•ฎโ”‚
 1.77 โ”ผโ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ•ฏ   โ•ฐโ•ฏ                                                                                                                โ•ฐโ•ฏ      โ•ฐโ”€โ”€โ”€โ”€โ•ฏ           โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ  โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏโ•ฐโ”€โ•ฏ    โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ•ฏโ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ•ฐโ•ฏ                                                        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ          โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ  โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ  โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ     โ•ฐโ”€โ•ฏ  โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ                                                   โ•ฐโ•ฏ
                                                                                                                                                                                                                          Watts over time

๐ŸŒณ CO2 Data: City: Boydton, Lat: 36.677696, Lon: -78.37471 Carbon Intensity for this location: 362 gCOโ‚‚eq/kWh SCI: 0.548484 gCOโ‚‚eq / pipeline run emitted

ArneTR commented 1 month ago

Another update on this: I tuned our servers and installed a new machine that is more suitable for this type of benchmarking and are now seeing values which I deem "acceptable".

I got the StdDev for repeated measurements down to ~5%

Screenshot 2024-06-02 at 11 55 43โ€ฏAM

https://metrics.green-coding.io/compare.html?ids=330dd6da-08f1-4b35-85cd-c5de09137d06,174dbca2-747a-4055-a59a-c173e4957dc2,7e93e046-3c60-4db2-8931-d516f1d09db7,cc963667-46f9-41ed-9cea-82edb82e9fea,6ffeb80a-5691-48f6-b84a-27de80f49c09,d52223fa-c092-4761-8183-95b90a6f98c0,cbe7bef6-eb58-44b7-bde6-8f419f81136b,a3805d33-5e39-4a47-9a23-56c008fcdf17,4d5f1d27-d861-414d-b8d8-b849eaed0862

This was achieved by:

The machine is very representative of a classic shared server machine in the cloud now.

To validate I also choose a load test with jMeter that also shows similar values:

Screenshot 2024-06-02 at 11 52 20โ€ฏAM

https://metrics.green-coding.io/compare.html?ids=2d7f0b5c-5b05-4288-b3d1-ab373e27affa,5740e368-74f4-4371-b13a-5d469e84637c,84ec9ad7-e9c4-49e6-a35d-fbbf9249d228,efa5d3de-3bf3-47d5-a177-e4eecf84e19d,a2ccc511-6350-4f6c-8618-1f7bcf2f3bbb,044b1005-0f20-4f7f-b786-5749e9c074e6,6a0cf413-69c3-4ac5-b41d-11b352c411e1,86dc553e-32d4-4010-ad34-832171330a48

IMHO this PR is now ready to merge. @mrchrisadams let me know if you have any remarks, then I will bring this functionality also to the Github Codespace.

mrchrisadams commented 1 month ago

Excellent news - thanks @ArneTR !

Can I check which machine this is running on now?

I'm assuming it's this one, and that would be helpful for fielding any questions in the workshop on Friday.

CO2 Benchmarking (DVFS OFF, TB OFF, HT OFF) - TX1330 M2 Use Case: For benchmarking of a software where configuration is tuned for reproducability Vendor: Fujitsu TX1330 M2 OS: Ubuntu 24.04 (NOP Linux) Type: Single-Tenant Server CPU: Intel(R) Xeon(R) CPU E3-1240L v5 @ 2.10GHz Cores: 4 Threads: 4 Hyper-Threading: Off Turbo Boost: Off DVFS: Off (Fixed to 2.1 GHz) C-States: C0 only Memory: 8 GB Sample measurement with machine specs Metrics Provider for Machine Power: MCP39F511N

mrchrisadams commented 1 month ago

Otherwise, I can see this PR being useful already, and would find it very helpful to have available. I'd be very happy for it to be merged in.

ArneTR commented 1 month ago

Correct the "CO2 Benchmarking" is the machine to choose. It is activated also on the "CO2 profiling", but values will be not as reliable (just in case you run into congestion because of too many handed in tests :) )

I have further reduced the boot up time of our machines to < 5 Minutes. Once they are up they will pick up tests instantly.

davidkopp commented 1 month ago

Fantastic! Thanks Arne for implementing this ๐Ÿ˜€

Regarding the implementation, the following thoughts came to my mind:

One question regarding the frontend: Is it planned to add charts to the stats page of a single run that displays the results for container power and container energy?

btw: I also like the new naming and the description of the machines of the measurement cluster. Now it is easier to decide which of the machines is suitable for the measurement I want to run.

ArneTR commented 1 month ago

Fantastic! Thanks Arne for implementing this ๐Ÿ˜€

Regarding the implementation, the following thoughts came to my mind:

  • Container power is only calculated, if machine power is available (either mcp or xgboost is required, right?). So RAPL is not sufficient, a difference to Scaphandre.

Correct!

  • Scaphandre calculates the power per process for each jiffy (small time period) and sums it up in the end. Your implementation calculates the power per container in the end using the the total power consumption and the average CPU utilization over the whole time period. I would assume this makes a difference, e.g. because of energy proportionality.

In theory if you could actually measure it then yes. But since neither Scaphandre nor we account for energy proportionality "integrating" or using the average would get you the same value. If you would chain Cloud Energy with every CPU% value then you would account for the energy proportionality but would have an estimation instead of a linear attributed measurement.

  • As you have already mentioned at other places, the usage of CPU instructions (like Kepler does it) instead of time / utilization would be preferable. Is the reason, that you don't use CPU instructions because of the complexity and the effort that is needed to implement it?

Jap, that is quite more complex. But doable. We will try to use the Green Kernel Plugin that we create to procure this value then. Happy if you say hello in the repository and also raise questions / wishes there if you have any! https://github.com/green-kernel

One question regarding the frontend: Is it planned to add charts to the stats page of a single run that displays the results for container power and container energy?

Not planned atm. reason being that the value is not the best metric to work with in the first place and for smaller chunks of time it becomes even more bogus. It can give a nice orientation as an average value, but for small time chunks I forsee it not really usable as kernel time tracking here is also not guaranteed to have such high resolutions. Having said that: I am open to bringing it in after a proper analysis, but this will take some time.

btw: I also like the new naming and the description of the machines of the measurement cluster. Now it is easier to decide which of the machines is suitable for the measurement I want to run.

ty!