hep-gc / cloudscheduler

Apache License 2.0
5 stars 2 forks source link

triggers seem to not work #306

Closed rseuster closed 3 years ago

rseuster commented 4 years ago

3 VMs on hephy won't go away:

Server: prod, Active User: seuster, Active Group: atlas-cern, User's Groups: ['atlas-cern', 'atlas-uvic', 'belle', 'belle-validation', 'cov-gpu', 'desy-belle', 'testing']

VMs: (1/4)
+------------+-------+------------------------------------------------+--------------------------------------+-------------------+--------------+----------------------------------------+---------+--------+
+ Group      | Cloud | Hostname                                       | VMID                                 | IPs               | Floating IPs | Authorization URL                      | Project | Status +
+------------+-------+------------------------------------------------+--------------------------------------+-------------------+--------------+----------------------------------------+---------+--------+
| atlas-cern | hephy | atlas-cern--hephy--3456932588--121096163595088 | 09afaed4-0e96-4972-9f02-69da4204e190 | ['192.168.1.188'] | []           | https://balrog-ctl.uibk.ac.at:5000/v3/ | atlas   | ACTIVE |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--139140120420416 | d8eabad7-2f86-42af-af05-f889d34fc4ae | ['192.168.1.155'] | []           | https://balrog-ctl.uibk.ac.at:5000/v3/ | atlas   | ACTIVE |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--101341710032495 | e2119fb3-c28a-4746-be53-ac461c1aeffe | ['192.168.1.198'] | []           | https://balrog-ctl.uibk.ac.at:5000/v3/ | atlas   | ACTIVE |
+------------+-------+------------------------------------------------+--------------------------------------+-------------------+--------------+----------------------------------------+---------+--------+

VMs: (2/4)
+------------+-------+------------------------------------------------+--------------------------------------+------+--------------+---------------------+---------------+-------------+---------------+---------------+
+            |       |                                                |                                      |      |              |                     |                          HTCondor                           +
+   Group    | Cloud |                    Hostname                    |              Flavor ID               | Task | Power Status |     Start Time      | STARTD Errors   STARTD Time   Primary Slots   Dynamic Slots +
+------------+-------+------------------------------------------------+--------------------------------------+------+--------------+---------------------+---------------+-------------+---------------+---------------+
| atlas-cern | hephy | atlas-cern--hephy--3456932588--121096163595088 | 461aa9c8-5de2-48f8-a6ff-939596b22fa7 | None | 1            | 2020-06-07 08:42:46 | None          | None        | 1             | 1             |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--139140120420416 | 461aa9c8-5de2-48f8-a6ff-939596b22fa7 | None | 1            | 2020-06-07 11:37:44 | None          | None        | 1             | 1             |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--101341710032495 | 461aa9c8-5de2-48f8-a6ff-939596b22fa7 | None | 1            | 2020-06-07 06:40:54 | None          | None        | 1             | 1             |
+------------+-------+------------------------------------------------+--------------------------------------+------+--------------+---------------------+---------------+-------------+---------------+---------------+

VMs: (3/4)
+------------+-------+------------------------------------------------+---------------------+-------------+-------------+---------------------+---------------+--------------+-------------+-------------+-------------+-------------+
+            |       |                                                |      HTCondor       |             |             |                     |               |              |             |             |             |             +
+   Group    | Cloud |                    Hostname                    |   Slots Timestamp   |   Retire    |  Terminate  |    Last Updated     |    Flavor     | Condor Slots |   Foreign   |    cores    | Disk (GBs)  |  Ram (MBs)  +
+------------+-------+------------------------------------------------+---------------------+-------------+-------------+---------------------+---------------+--------------+-------------+-------------+-------------+-------------+
| atlas-cern | hephy | atlas-cern--hephy--3456932588--121096163595088 | 2020-06-12 15:21:19 | 10          | 0           | 2020-06-15 15:05:39 | fastio.xlarge | None         | 0           | 8           | 40          | 15360       |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--139140120420416 | 2020-06-12 15:39:04 | 10          | 0           | 2020-06-15 15:05:39 | fastio.xlarge | None         | 0           | 8           | 40          | 15360       |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--101341710032495 | 2020-06-12 16:27:56 | 10          | 0           | 2020-06-15 15:05:39 | fastio.xlarge | None         | 0           | 8           | 40          | 15360       |
+------------+-------+------------------------------------------------+---------------------+-------------+-------------+---------------------+---------------+--------------+-------------+-------------+-------------+-------------+

VMs: (4/4)
+------------+-------+------------------------------------------------+-------------+---------------+-------------+----------------+
+ Group      | Cloud | Hostname                                       | Swap (GBs)  | Poller Status | State Age   | Manual_Control +
+------------+-------+------------------------------------------------+-------------+---------------+-------------+----------------+
| atlas-cern | hephy | atlas-cern--hephy--3456932588--121096163595088 | 0           | retiring      | 288001      | 0              |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--139140120420416 | 0           | retiring      | 286936      | 0              |
| atlas-cern | hephy | atlas-cern--hephy--3456932588--101341710032495 | 0           | retiring      | 284004      | 0              |
+------------+-------+------------------------------------------------+-------------+---------------+-------------+----------------+
Rows: 3

These are still registered in condor:

seuster@csv2a :~$ condor_status -m | grep -e atlas-cern--hephy--3456932588--121096163595088 -e  atlas-cern--hephy--3456932588--139140120420416 -e atlas-cern--hephy--3456932588--101341710032495
atlas-cern--hephy--3456932588--101341710032495        8.6.13.453497     8    14.7 GB    8+16:33:41
atlas-cern--hephy--3456932588--121096163595088        8.6.13.453497     8    14.7 GB    8+14:36:12
atlas-cern--hephy--3456932588--139140120420416        8.6.13.453497     8    14.7 GB    8+11:38:12

but they don;t hav a job running:

seuster@csv2a :~$ condor_status | grep -e atlas-cern--hephy--3456932588--121096163595088 -e  atlas-cern--hephy--3456932588--139140120420416 -e atlas-cern--hephy--3456932588--101341710032495
seuster@csv2a :~$ 

It seems these VMs were retired on June 12, 3 days ago.

colsond commented 3 years ago

Going to be addressed in next cloudscheduler version that contains redundant slot count accounting.