mozilla / firefox-translations-training

Training pipelines for Firefox Translations neural machine translation models
https://mozilla.github.io/firefox-translations-training/
Mozilla Public License 2.0
145 stars 31 forks source link

Add a memory logger #821

Closed gregtatum closed 3 weeks ago

gregtatum commented 3 weeks ago

I found this really useful for managing OOM issues in the merge mono task by measuring the memory while running things.

Example live log.

This PR is blocking submitting the HPLT importer, as it's using the memory logger as well.

eu9ene commented 3 weeks ago

Nice idea! I agree it can be more convenient to look at memory allocation in the logs, but we also have the GCP dashboard where you can filter things by the machine ID. For example: translations-1-b-linux-large-gcp-300gb-gd4dp9-ht0mwubpmall1yq for your task.

[taskcluster 2024-08-30 00:14:09.606Z] Hostname: translations-1-b-linux-large-gcp-300gb-gd4dp9-ht0mwubpmall1yq
gregtatum commented 3 weeks ago

Nice idea! I agree it can be more convenient to look at memory allocation in the logs.

I also like it because I can see it locally when I'm developing the script to ensure I'm not screwing things up with my mental model of what's being loaded in memory. It was quite handy on getting some of my scripts working on production levels of data.