joular / powerjoular

PowerJoular allows monitoring power consumption of multiple platforms and processes.
https://www.noureddine.org/research/joular/powerjoular
GNU General Public License v3.0
66 stars 15 forks source link

Powerjoular on RAPL supported Intel processor #29

Closed Apoorvanp closed 9 months ago

Apoorvanp commented 1 year ago

Hello We ran the powerjoular to measure the power consumption of the PID of our Python script using the command -

profiler_cmd = f'powerjoular -tp {self.target.pid} -f {context.run_dir / "powerjoular.csv"}'
self.profiler = subprocess.Popen(shlex.split(profiler_cmd))

Test results in powerjoular.csv-{self.target.pid}.csv have some +Inf values for CPU Utilization and Nan values for CPU Power as seen in the screenshot (it appears to follow some pattern) power-joular-resultsDataIssue (2)

Are we doing something wrong with the way we use it? Could you please help us in understanding the reason for this?

Thank you!

adelnoureddine commented 1 year ago

CPU Power is calculated from CPU utilization, so if the latter is +inf (infinite), then CPU power is not a number. Could you please check, for your PID, if Linux kernel is providing correct data ? In particular in file /proc/PID/stat and in /proc/stat for general CPU data.

Apoorvanp commented 1 year ago

We ran the /proc/PID/stat while running our python3 process and this is the result -

image

This is the result from /proc/stat at the same time -

image

I am not sure which column is being utilised to calculate the CPU Utilization and Power in PowerJoular. Also we see some zero values but not infinity. What do you think might be the reason for this?

adelnoureddine commented 1 year ago

The data seems fine. I'd like to replicate the error on my computer to debug the issue. Could you send your script so I can test it on my PC ? If it's confidential, you can send it to my university email at my contact page: https://www.noureddine.org/contact

Apoorvanp commented 1 year ago

I've sent you an email :) thanks for helping out!

iivanoo commented 11 months ago

Hi @adelnoureddine did you have time to look at this?

adelnoureddine commented 11 months ago

Hi @iivanoo, I replied to the students by email and we had a few exchanges. I tried the code they send me (from your lab on git), and it executed well on my machine. So I couldn't reproduce the results they had (all was quite fine).

iivanoo commented 11 months ago

Ok, thanks Adel! Just for completeness, is it normal that sometimes some of the power readings are +Inf or NaN? All the other values that were collected seem correct.

adelnoureddine commented 11 months ago

Hi Ivano. In some rare occasions, we might have negative or infinite values due to a first energy measure is collected from RAPL but not the 2nd or vice-verse, hence when we calculate the difference to get the power consumed in the last second, we might get such data. This is more often in the first seconds or the lasts of measuring. I've got 2 reports of users having negative or infinite values, but so far couldn't replicate any.

iivanoo commented 11 months ago

Ok, thanks Adel! If you want, we can help in replicating a mini-experiment on our machines and look at system logs together, let me know via email.

adelnoureddine commented 9 months ago

I guess things went well over email. I'm closing this issue unless there is additional needs.