oxidecomputer / buildomat

a software build labour-saving device
Mozilla Public License 2.0
53 stars 2 forks source link

upload error: UNIQUE constraint failed: job_output.job, job_output.path #50

Open jclulow opened 5 months ago

jclulow commented 5 months ago

We have had a job that should have succeeded fail with what really feels like a spurious error:

169 2024-03-02T01:12:08.179Z    uploading: /tmp/debug/mpstat.txt (287893 bytes)
170 2024-03-02T01:12:08.210Z    upload warning: file "/tmp/debug/mpstat.txt" changed size mid upload: 287893 -> 290801
171 2024-03-02T01:12:11.291Z    upload error: UNIQUE constraint failed: job_output.job, job_output.path (after 50ms)

https://buildomat.eng.oxide.computer/wg/0/details/01HQYA1M8P0PTRSDTX7D29TQG7/HMHmaiTYYh2ZNSaXhABC0NuQTxIOlgzMpSb2gAL7QY92DMdg/01HQYA20KPJWHJS6FEX70D82K5#S171

I suspect this is the result of a file upload that was interrupted after (partially) succeeding while the buildomat API server was being redeployed:

$ LANG=C egrep '^\[|01HQYBFSD4QS3E6ZGD6BD02BFN' /var/tmp/20240302T01.log | looker
01:12:08.212Z INFO buildomat (files): starting work on job 01HQYA2ARPXZ0BKACQTXJ40TJJ commit 01HQYBFSD4QS3E6ZGD6BD02BFN
    chunks = 2
    expected_size = 290801
    file-commit = 0
    queue_depth = 0
01:12:08.216Z INFO buildomat (files): finished work on job 01HQYA2ARPXZ0BKACQTXJ40TJJ commit 01HQYBFSD4QS3E6ZGD6BD02BFN
    duration_msec = 3
    file-commit = 0
[ Mar  2 01:12:09 Stopping because service restarting. ]
[ Mar  2 01:12:09 Executing stop method (:kill). ]
[ Mar  2 01:12:09 Executing start method ("/opt/buildomat/lib/buildomat-server -b 0.0.0.0:9979 -f config.toml &"). ]
[ Mar  2 01:12:09 Method "start" exited with status 0. ]
01:12:10.278Z INFO buildomat (files): starting work on job 01HQYA2ARPXZ0BKACQTXJ40TJJ commit 01HQYBFSD4QS3E6ZGD6BD02BFN
    chunks = 2
    expected_size = 290801
    file-commit = 0
    queue_depth = 0
01:12:10.328Z ERRO buildomat (files): job 01HQYA2ARPXZ0BKACQTXJ40TJJ commit 01HQYBFSD4QS3E6ZGD6BD02BFN failed: database: UNIQUE constraint failed: job_output.job, job_output.path
    duration_msec = 50
    file-commit = 0