ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
179 stars 24 forks source link

Remove annoying warning messages in PowerMonitor #72

Closed Sunt-ing closed 1 month ago

Sunt-ing commented 1 month ago

Describe the bug Not a real bug but an improvement from my perspective.

When using PowerMonitor on Cloudlab c240g5 P100 GPU, the warnings are really annoying and interfere my reading CLI outputs.

/opt/miniconda/envs/llama/lib/python3.10/site-packages/zeus/monitor/power.py:189: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

To Reproduce

  1. Minimal training script
import time
from zeus.monitor.power import PowerMonitor

if __name__ == "__main__":
    gpu_indices = [0]
    monitor = PowerMonitor(gpu_indices)
    st = time.time()
    time.sleep(5)
    end = time.time()

    while st < end:
        power = monitor.get_power(st)
        st += 0.1
        print(f"{st}, {power[0]}")
  1. The command you ran
python tmp.py

Outputs are like:

$ python tmp.py
[2024-05-08 02:05:49,463] [zeus.device.gpu](gpu.py:917) PyNVML is available and initialized.
[2024-05-08 02:05:49,469] [PowerMonitor](power.py:151) Monitoring power usage of GPUs [0]
[2024-05-08 02:05:49,470] [zeus.monitor.power](power.py:51) Detected Tesla P100-PCIE-12GB, inferring NVML power counter update period.
[2024-05-08 02:05:49,885] [zeus.monitor.power](power.py:56) Counter update period for Tesla P100-PCIE-12GB is 0.10 s
[2024-05-08 02:05:51,372] [zeus.device.gpu](gpu.py:917) PyNVML is available and initialized.
/opt/miniconda/envs/llama/lib/python3.10/site-packages/zeus/monitor/power.py:189: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  self.power_df = pd.concat([self.power_df, additional_df], axis=0)
1715151949.990017, 23.986
1715151950.0900168, 23.986
1715151950.1900167, 23.986
1715151950.2900167, 23.986
/opt/miniconda/envs/llama/lib/python3.10/site-packages/zeus/monitor/power.py:189: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  self.power_df = pd.concat([self.power_df, additional_df], axis=0)
1715151950.3900166, 23.986
/opt/miniconda/envs/llama/lib/python3.10/site-packages/zeus/monitor/power.py:189: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  self.power_df = pd.concat([self.power_df, additional_df], axis=0)
1715151950.4900165, 23.986
1715151950.5900164, 23.986
1715151950.6900163, 23.986
1715151950.7900162, 23.986

My workaround is something like this:


        self.power_df = self.power_df.dropna(how='all', axis=1)
        additional_df = additional_df.dropna(how='all', axis=1)
        self.power_df = pd.concat([self.power_df, additional_df], axis=0)

But you may propose better solutions.

I also suggest adding an example of PowerMonitor since it has some docs already and can be useful to developers.

jaywonchung commented 1 month ago

The workaround looks good as it directly addresses the what the warning is saying, but I kinda feel weird because the CSV file (in theory) should not contain any NA values. I also believe empty dataframes are also avoided by catching pd.errors.EmptyDataError. Could you print out the two dataframes inside _update_df?

Sunt-ing commented 1 month ago

Sure!

power_df               

time  power0
0     1.715177e+09  67.876
1     1.715177e+09  68.867
2     1.715177e+09  68.149
3     1.715177e+09  67.677
4     1.715177e+09  68.832
...            ...     ...
1047  1.715177e+09  24.929
1048  1.715177e+09  24.945
1049  1.715177e+09  24.929
1050  1.715177e+09  24.945
1051  1.715177e+09  24.945

[1052 rows x 2 columns]

additional_df              

time  power0
0    1.715177e+09  24.929
1    1.715177e+09  24.945
2    1.715177e+09  24.945
3    1.715177e+09  24.945
4    1.715177e+09  24.945
..            ...     ...
995  1.715177e+09  24.945
996  1.715177e+09  24.945
997  1.715177e+09  24.945
998  1.715177e+09  24.929
999  1.715177e+09  24.945

[1000 rows x 2 columns]
jaywonchung commented 1 month ago

Fixed. Let me know if this still happens!