Inconsistent score for the browser device depending on the presence of containers

julienrf commented 10 months ago

I am analyzing the footprint of a Wordpress stack. I get significantly different scores for the browser device depending on the presence or absence of containers (ie, with or without the --containers option).

I used the Docker Compose setup described here.

Full stack estimation (browser + Apache + DB):

greenframe analyze http://localhost:8080 --containers="wordpress-benchmarks-wordpress-1" --databaseContainers="wordpress-benchmarks-db-1"
[…]
The estimated footprint is 10.177 mg eq. co2 ± 1.3% (23.025 mWh).

Removing the Apache server (estimating browser + DB):

reenframe analyze http://localhost:8080 --databaseContainers="wordpress-benchmarks-db-1"
[…]
The estimated footprint is 5.953 mg eq. co2 ± 1.1% (13.468 mWh).

Removing the DB (estimating browser + Apache):

greenframe analyze http://localhost:8080 --containers="wordpress-benchmarks-wordpress-1"
[…]
The estimated footprint is 8.828 mg eq. co2 ± 1.9% (19.973 mWh).

From the executions above, we can deduce the following break down:

Container	Energy consumption	Explanation
Apache	9.56	23.025 - 13.468
DB	3.052	23.025 - 19.973
Browser	10.413	23.025 - 9.56 - 3.052

Note that we deduced the energy consumption of the browser by subtracting the Apache and DB consumption from the total. Now, if we run the analysis again but for the browser only (without any --containers options), we would expect to get something close to 10.413 mWh. However, this is not what we get:

greenframe analyze http://localhost:8080
[…]
The estimated footprint is 8.343 mg eq. co2 ± 1.6% (18.875 mWh).

We get 18.9 instead of 10.4.

I also customized the output of greenframe-cli to show a per container breakdown (so that I don’t need to do the subtraction manually), and here is what I got.

Full-stack analysis (browser + Apache + DB):

bin/run analyze http://localhost:8080 --containers="wordpress-benchmarks-wordpress-1" --databaseContainers="wordpress-benchmarks-db-1"
[…]
The estimated footprint is 8.615 mg eq. co2 ± 0.7% (19.49 mWh).
  For container greenframe-runner (DEVICE), the footprint is 5.693 mg eq. co2 (12.88 mWh) ;
  For container wordpress-benchmarks-wordpress-1 (SERVER), the footprint is 2.78 mg eq. co2 (6.291 mWh) ;
  For container wordpress-benchmarks-db-1 (DATABASE), the footprint is 0.141 mg eq. co2 (0.319 mWh) ;

Browser only:

bin/run analyze http://localhost:8080
[…]
The estimated footprint is 8.38 mg eq. co2 ± 1.6% (18.959 mWh).
  For container greenframe-runner (DEVICE), the footprint is 8.38 mg eq. co2 (18.959 mWh) ;

I find a similar difference in the estimation of the browser energy consumption (12.88 vs 18.96) depending on the options passed to greenframe-cli.

I don’t think those differences are caused by the expected variability we should get when running those benchmarks on a laptop, where my OS runs many background tasks at the same time because when I try to re-run greenframe-cli I consistently get the same differences.

My goal is to be able to compare different stacks, which means that I will have to compare the output of greenframe-cli run with different --containers options (e.g. to compare a static site to a Wordpress stack), but I am not sure this way of using greenframe-cli is reliable. Maybe it has been designed to compare the results of identical invocations only?

I performed another test to compare a static site served by Nginx with a dynamically generated site served by Apache + PHP and I got very confusing results where the energy consumption of the browser only (greenframe analyse http://localhost) was higher than the energy consumption of the browser + Apache (greenframe analyse http://localhost --containers=php).

fzaninotto commented 10 months ago

This seems to be a reproducibility issue. If you run the same test several times, does it output consistent results?

My interpretation is that given the low emissions you're getting, your scenario is very short and therefore very subject to variations in the host. You will get better results with a longer scenario, and a higher number of samples.

See https://docs.greenframe.io/commands#improving-the-analysis-precision for more details.

julienrf commented 10 months ago

Thank you for your response.

If you run the same test several times, does it output consistent results?

Yes it does. I also tried increasing the number of samples but the results are always very similar (less than 10% of variations).

fzaninotto commented 10 months ago

One explanation I can think of is that your server uses multithreading to optimize the workload when there are many containers. Therefore, the payload of a single container takes less time when run in conjunction with other payloads. This is concerning, as Docker promises process isolation, and we use docker stats to track individual processes.

If that's the case, this means you can't add or subtract energy consumption between greenframe analyses with different containers.

marmelab / greenframe-cli

Inconsistent score for the browser device depending on the presence of containers #69