OOM looks like a preemption

mozilla / translations

The code, training pipeline, and models that power Firefox Translations

https://mozilla.github.io/translations/

Mozilla Public License 2.0

154 stars 33 forks source link

OOM looks like a preemption #562

Open eu9ene opened 6 months ago

eu9ene commented 6 months ago

Sometimes we run into OOM and it's hard to say from the logs that it's the case. It looks like a preemption of a spot instance. We should be able to easily identify that the task was terminated because the machine was out of memory.

eu9ene commented 5 months ago

landing #561 and setting up dashboards for CPU machines can help with that

bhearsum commented 3 months ago

I don't think there's anything we can do to make this better in this repo nor taskgraph. This is a worker issue that's been filed as https://github.com/taskcluster/taskcluster/issues/6894