nextflow-io / nf-co2footprint

[WIP] A Nextflow plugin to estimate the CO2 footprint of pipeline runs.
https://nextflow-io.github.io/nf-co2footprint/
Apache License 2.0
12 stars 4 forks source link

Fix uc calculation and add warning #84

Closed skrakau closed 8 months ago

skrakau commented 8 months ago

I got an error for a case where the reported cpu_usage was 0.0.

In this context I realised that I did not use the actual number of requested CPUs nc to compute uc, but a rounded value obtained from the nextflow cpu_usage metric. Not sure why exactly I did that.

This caused an error when cpu_usage was 0.0, but it is also bad, since for the final formula the nc value was used, and this would cause erroneous results if those nc and cpus_ceil would differ.

I changed it to only use nc now. Additionally I added a warning if cpu_usage is 0.0, since this might indicate that something is wrong as well.

@mirpedrol could you double check?

mirpedrol commented 8 months ago

I think it makes sense to use the number of cpus to calculate uc. nc and the value that we were calculating (cpu_ceil) should be the same value most of the times. But now I checked how to find cpu usage, according to the Green algorithms FAQ (How do I find the usage factor of my processors?) there are some commands such as seff for SLURM, and I found this formula efficiency=cpu_time / (run_time x number_of_cpus), so I am wondering if we should use this instead?

skrakau commented 8 months ago

Thanks!

But now I checked how to find cpu usage, according to the Green algorithms FAQ (How do I find the usage factor of my processors?) there are some commands such as seff for SLURM, and I found this formula efficiency=cpu_time / (run_time x number_of_cpus), so I am wondering if we should use this instead?

The Green algorithms docs are meant for usurs who manually have to provide and figure out those values. In our case we luckily have those metrics from Nextflow. And if I see it correctly, the seff CPU efficiency should be the same as what we compute based on the Nextflow cpu_usage.

skrakau commented 8 months ago

Where did you actually find this formula? Because I could find proper documentation

mirpedrol commented 8 months ago

Here is how the seff command calculates it.

skrakau commented 8 months ago

Here is how the seff command calculates it.

Thanks! I think it's in the end the same as using the cpu_usage from Nextflow, which also include the walltime and the CPU time for the calculation