mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Add an HPLT data importer #837

Closed gregtatum closed 1 month ago

gregtatum commented 1 month ago

This is now ready for review. Resolves #537.

gregtatum commented 1 month ago

Yeah for the big CI run, I was hesitant to run it, as it pulls in a very large file for download. I guess I could severely restrict how much data to collect in it, to minimize hitting their download servers.

eu9ene commented 1 month ago

I mostly looked at the Taskcluster parts, and that looks largely fine. The only comment I have is that if I've traversed the graph correctly, there's no hlpt tasks in the CI run, which means there's no coverage for the hlpt code paths in the downloader. I see some unit tests for those, so it's probably no big deal, but I wanted to call it out explicitly.

This should probably get @eu9ene's review as well, I assume.

Yes, I'll review today.

eu9ene commented 1 month ago

@gregtatum I did not review this again and did not approve it, so it was not ready to merge. When I start reviewing a PR, it shouldn't be merged until it's approved. Otherwise, I cannot verify the fixes.

gregtatum commented 1 month ago

We discussed this on our 1:1, and there was misunderstanding on the status of the PR. I merged it because it had an approval from Ben and you had taken a look at it, which I took to mean as "ready to merge." Your intention was to wait for an approval. We agreed not to back it out, and I would handle any feedback in a follow-up.

We agreed that in the future to mark PRs as "request changes" to avoid ambiguous statuses when it's not ready to merge.