jdh4 / job_defense_shield

GNU General Public License v2.0
3 stars 4 forks source link

Key errors with `--utilization-overview` #1

Closed klieret closed 1 month ago

klieret commented 5 months ago

Running without a chache:

python job_defense_shield.py --utilization-overview                                                                                                                                                          ─╯
Configuration file: /home/kl5675/Documents/24/git_sync/job_defense_shield/config.yaml

Calling sacct (which may require several seconds) ... done.
Number of rows (before): 309029
Number of rows  (after): 286320
Traceback (most recent call last):
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'cli'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pandas/_libs/index.pyx", line 774, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'cli'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/job_defense_shield.py", line 531, in <module>
    util = UtilizationOverview(df,
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/alert/utilization_overview.py", line 10, in __init__
    super().__init__(df, days_between_emails, violation, vpath, subject, kwargs)
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/base.py", line 29, in __init__
    self._filter_and_add_new_fields()
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/alert/utilization_overview.py", line 61, in _filter_and_add_new_fields
    cli = self.special.at[("della", "cli"), "gpu-hours"]
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexing.py", line 2431, in __getitem__
    return super().__getitem__(key)
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexing.py", line 2382, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/frame.py", line 3929, in _get_value
    loc = engine.get_loc(index)
  File "pandas/_libs/index.pyx", line 777, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: ('della', 'cli')

Before removing the cache I was also getting:

Configuration file: /home/kl5675/Documents/24/git_sync/job_defense_shield/config.yaml

Using cache file.

Number of rows (before): 10354
Number of rows  (after): 9041
Traceback (most recent call last):
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'gpu'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pandas/_libs/index.pyx", line 774, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'gpu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/job_defense_shield.py", line 531, in <module>
    util = UtilizationOverview(df,
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/alert/utilization_overview.py", line 10, in __init__
    super().__init__(df, days_between_emails, violation, vpath, subject, kwargs)
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/base.py", line 29, in __init__
    self._filter_and_add_new_fields()
  File "/home/kl5675/Documents/24/git_sync/job_defense_shield/alert/utilization_overview.py", line 51, in _filter_and_add_new_fields
    gpu = self.special.at[("della", "gpu"), "gpu-hours"]
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexing.py", line 2431, in __getitem__
    return super().__getitem__(key)
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/indexing.py", line 2382, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File "/scratch/gpfs/kl5675/micromamba/envs/gnn/lib/python3.10/site-packages/pandas/core/frame.py", line 3929, in _get_value
    loc = engine.get_loc(index)
  File "pandas/_libs/index.pyx", line 777, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: ('della', 'gpu')
klieret commented 5 months ago

I checked and there simply doesn't seem to be a cli partition in della. Was that meant to be pli?

cli = self.special.at[("della", "cli"), "gpu-hours"]
cli = int(cli.split("(")[0].strip())
cli = round(100 * cli / 32 / period_hours)
self.special.at[("della", "cli"), "Usage(%)"] = cli
jdh4 commented 5 months ago

This has been fixed by removing the Usage(%) column. That would have to enter via a config file to be done properly. One can do:

python job_defense_shield.py --utilization-overview --days=14 -M della -r pli,pli-c