Closed bartvanb closed 1 month ago
During the workshop we had 2 instances on the same appservice as or other servers.
I moved the workshop server to its own app service so we could scale independently. I could not find an explanation for the high CPU load as I do no longer observe this. It might be that a lot of nodes had central tasks polling (Although these request are very light weight)?
Of course we have ~30 participants, that means about 30 nodes + 30 open UIs... That still would be a challenge, we can scale to 30 instances. But we also should consider the costs at this point (30*450 / 31 = 435 euro's per 24h), if we are conservative we can probably use it for about 18h which is about ~326.
[client.task.delete(task.get("id")) for task in client.task.list(per_page=999)["data"]]
to do so.The only thing we have for the shutting down nodes are the following logs... It seems to get a normal kill signal:
2024-09-20 20:57:00 - node - INFO - Node is interrupted, shutting down...
2024-09-20 20:57:00 - socket - INFO - Disconnected from the server
2024-09-20 20:57:01 - socket - INFO - Oak_Date left room collaboration_104
2024-09-20 20:57:01 - network_man.. - DEBUG - Disconnecting vantage6-Oak_Apple-user from ne
twork'vantage6-Oak_Apple-user-net'
I also see the nodes frees memory up at this time, but is definitely not out of it:
The machine has no swap memory at all.. so that might be the cause
Ok I think I found the cause. At the time the nodes shut down Azure applies updates to our VM. Not sure why it kills our containers.
It seems like we are not able to control this process as this is a 'Azure Managed - Safe Deployment'
I suggest we take the risk. Worst case we would need to reboot the nodes.
We are going with this plan.
Description
Maybe these are related