mozilla / firefox-translations-training

Training pipelines for Firefox Translations neural machine translation models
https://mozilla.github.io/firefox-translations-training/
Mozilla Public License 2.0
145 stars 31 forks source link

collect_mono failed with Read error (39) : premature end #680

Closed eu9ene closed 2 months ago

eu9ene commented 3 months ago

https://firefox-ci-tc.services.mozilla.com/tasks/IujzQFOHSyarfuGZTqYOHg/runs/0/logs/public/logs/live.log

[task 2024-06-17T20:26:28.119Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 17 MB...    
[task 2024-06-17T20:26:28.285Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 71 MB...    
[task 2024-06-17T20:26:28.452Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 125 MB...    
[task 2024-06-17T20:26:28.619Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 176 MB...    
[task 2024-06-17T20:26:28.786Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 225 MB...    
[task 2024-06-17T20:26:28.953Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 276 MB...    
[task 2024-06-17T20:26:28.996Z] Decompress:  7/20 files. Current: ...file.16.out.zst : 326 MB...    
[task 2024-06-17T20:26:29.119Z]                                                                                
[task 2024-06-17T20:26:29.119Z] 
[task 2024-06-17T20:26:29.286Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 34 MB...    
[task 2024-06-17T20:26:29.453Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 90 MB...    
[task 2024-06-17T20:26:29.620Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 144 MB...    
[task 2024-06-17T20:26:29.787Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 200 MB...    
[task 2024-06-17T20:26:29.954Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 250 MB...    
[task 2024-06-17T20:26:30.065Z] Decompress:  8/20 files. Current: ...file.17.out.zst : 299 MB...    ches/file.17.out.zst : Read error (39) : premature end 
eu9ene commented 3 months ago

Also in shortlist for en-uk: https://firefox-ci-tc.services.mozilla.com/tasks/cY2ErCZGR9q__-D5orkncA/runs/0/logs/public/logs/live.log

bhearsum commented 2 months ago

Something definitely went wrong with the download of file.17.out.zst here. It, and another file from the log:

https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/EhFkOBKCTOi5Vt5wMLJyOA/artifacts/public/build/file.4.out.zst resolved to 136533813 bytes with sha256 93e25e98fb56696dd562c99bfb63ceac87cd27e1a9b324b6f1089989fae24c05 in 7.914s
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UBpNNGbKRdK19sBoRLRarA/artifacts/public/build/file.17.out.zst resolved to 133492367 bytes with sha256 c353ff408dbcde93825824c2c935bcc0d0f0d67cf4b97a8589e66a7366b17ffc in 7.970s

Yet when I download myself, I get a match for 4, but different results for 17:

~/tmp/2024-07-09 ❯ sha256sum *   
4999a845ad51b64a7041cf2152c4a926c3ba6275bb05cd7cb82644f2e410137d  file.17.out.zst
93e25e98fb56696dd562c99bfb63ceac87cd27e1a9b324b6f1089989fae24c05  file.4.out.zst
~/tmp/2024-07-09 ❯ ls -l
total 266780
-rw-rw-r-- 1 bhearsum bhearsum 136633331 Jun 17 11:16 file.17.out.zst
-rw-rw-r-- 1 bhearsum bhearsum 136533813 Jun 17 11:21 file.4.out.zst

Taskgraph is responsible for these downloads. I've filed https://github.com/taskcluster/taskgraph/issues/538 for this.

bhearsum commented 2 months ago

Upstream issue is fixed; I'll keep this open until we pick up a taskgraph version with the fix.