Closed eu9ene closed 3 months ago
https://firefox-ci-tc.services.mozilla.com/tasks/BaqT4V6ORTeQR66ycxy7KA/runs/1
It doesn't look like OOM based on GCP dashboards. Also, I switched to chunking on 100M sentences to reduce memory and it still fails.
We do not see this problem for other languages so far including uk-en. It might be a language-pair specific issue in eflomal.
2024-07-01 02:16:19.086 Jul 1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq fstrim[7148]: /boot/efi: 98.3 MiB (103061504 bytes) trimmed on /dev/sda15 2024-07-01 02:16:19.086 Jul 1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq fstrim[7148]: /: 515 GiB (553010765824 bytes) trimmed on /dev/sda1 2024-07-01 02:16:19.086 Jul 1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: fstrim.service: Deactivated successfully. 2024-07-01 02:16:19.087 Jul 1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished Discard unused blocks on filesystems from /etc/fstab. 2024-07-01 02:16:20.788 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 About to reclaim task BaqT4V6ORTeQR66ycxy7KA... 2024-07-01 02:16:20.789 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA... 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaimed task BaqT4V6ORTeQR66ycxy7KA successfully. 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Notifying listener taskcluster-proxy of state change 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Received task status change: Reclaimed 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Got http status code 200 when issuing PUT to http://localhost:80/credentials with clientId task-client/BaqT4V6ORTeQR66ycxy7KA/0/on/us-west1-b/3862006725621225983/until/1719797780.822 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Successfully reclaimed task BaqT4V6ORTeQR66ycxy7KA 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA at 2024-07-01 01:33:20.822 +0000 UTC 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Current task claim expires at 2024-07-01 01:36:20.822 +0000 UTC 2024-07-01 02:16:20.837 Jul 1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA in 16m59.984514843s 2024-07-01 02:17:01.802 Jul 1 01:17:01 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq CRON[7154]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) 2024-07-01 02:20:25.378 {"code":"LogPingOpsAgent"} 2024-07-01 02:25:18.194 Jul 1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh... 2024-07-01 02:25:18.216 Jul 1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully. 2024-07-01 02:25:18.216 Jul 1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7160]: 2024/07/01 01:25:18: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404 2024-07-01 02:25:18.216 Jul 1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7160]: 2024/07/01 01:25:18: Done 2024-07-01 02:25:18.216 Jul 1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh. 2024-07-01 02:30:25.378 {"code":"LogPingOpsAgent"} 2024-07-01 02:33:20.823 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 About to reclaim task BaqT4V6ORTeQR66ycxy7KA... 2024-07-01 02:33:20.849 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA... 2024-07-01 02:33:20.938 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaimed task BaqT4V6ORTeQR66ycxy7KA successfully. 2024-07-01 02:33:20.938 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Notifying listener taskcluster-proxy of state change 2024-07-01 02:33:20.938 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Received task status change: Reclaimed 2024-07-01 02:33:20.939 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Got http status code 200 when issuing PUT to http://localhost:80/credentials with clientId task-client/BaqT4V6ORTeQR66ycxy7KA/0/on/us-west1-b/3862006725621225983/until/1719798800.923 2024-07-01 02:33:20.939 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Successfully reclaimed task BaqT4V6ORTeQR66ycxy7KA 2024-07-01 02:33:20.939 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA at 2024-07-01 01:50:20.923 +0000 UTC 2024-07-01 02:33:20.939 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Current task claim expires at 2024-07-01 01:53:20.923 +0000 UTC 2024-07-01 02:33:20.939 Jul 1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA in 16m59.98398485s 2024-07-01 02:35:28.030 Jul 1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh... 2024-07-01 02:35:28.041 Jul 1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7175]: 2024/07/01 01:35:28: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404 2024-07-01 02:35:28.041 Jul 1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7175]: 2024/07/01 01:35:28: Done 2024-07-01 02:35:28.042 Jul 1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully. 2024-07-01 02:35:28.042 Jul 1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh. 2024-07-01 02:40:25.377 {"code":"LogPingOpsAgent"} 2024-07-01 02:45:39.446 Jul 1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh... 2024-07-01 02:45:39.477 Jul 1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully. 2024-07-01 02:45:39.477 Jul 1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7191]: 2024/07/01 01:45:39: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404 2024-07-01 02:45:39.477 Jul 1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7191]: 2024/07/01 01:45:39: Done 2024-07-01 02:45:39.477 Jul 1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh. 2024-07-01 02:49:34.071 Jul 1 01:49:34 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:49:34.070Z] Final argmax iteration: 15890.433 s#015 2024-07-01 02:49:34.100 Jul 1 01:49:34 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:49:34.070Z] Writing alignments to /builds/worker/artifacts/tmp/aln.fwd.aa for 100000000 sentencess#015 2024-07-01 02:50:20.924 Jul 1 01:50:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 About to reclaim task BaqT4V6ORTeQR66ycxy7KA... 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA... 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaimed task BaqT4V6ORTeQR66ycxy7KA successfully. 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Notifying listener taskcluster-proxy of state change 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Received task status change: Reclaimed 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Got http status code 200 when issuing PUT to http://localhost:80/credentials with clientId task-client/BaqT4V6ORTeQR66ycxy7KA/0/on/us-west1-b/3862006725621225983/until/1719799820.983 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Successfully reclaimed task BaqT4V6ORTeQR66ycxy7KA 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA at 2024-07-01 02:07:20.983 +0000 UTC 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Current task claim expires at 2024-07-01 02:10:20.983 +0000 UTC 2024-07-01 02:50:21.011 Jul 1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA in 16m59.985546732s 2024-07-01 02:50:25.378 {"code":"LogPingOpsAgent"} 2024-07-01 02:52:15.558 Jul 1 01:52:15 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:52:15.557Z] [alignments] Processing part ab#015 2024-07-01 02:52:16.087 Jul 1 01:52:16 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:52:15.557Z] [alignments] Calculating alignments...#015 2024-07-01 02:56:10.674 Jul 1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh... 2024-07-01 02:56:10.706 Jul 1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully. 2024-07-01 02:56:10.706 Jul 1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7214]: 2024/07/01 01:56:10: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404 2024-07-01 02:56:10.706 Jul 1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7214]: 2024/07/01 01:56:10: Done 2024-07-01 02:56:10.706 Jul 1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh. 2024-07-01 03:00:25.377 {"code":"LogPingOpsAgent"} 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.273577] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.279641] rcu: 5-...0: (7 ticks this GP) idle=46ac/1/0x4000000000000000 softirq=130361/130363 fqs=7168 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.289317] rcu: hardirqs softirqs csw/system 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.294985] rcu: number: 0 0 0 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.300665] rcu: cputime: 0 0 0 ==> 30028(ms) 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.307741] rcu: 23-...0: (3 GPs behind) idle=ed14/0/0x1 softirq=85779/85779 fqs=7169 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.315755] rcu: hardirqs softirqs csw/system 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.321425] rcu: number: 0 0 0 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.327094] rcu: cputime: 0 0 0 ==> 30052(ms) 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.334181] rcu: (detected by 36, t=15017 jiffies, g=1443741, q=15967 ncpus=64) 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341677] Sending NMI from CPU 36 to CPUs 5: 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341688] NMI backtrace for cpu 5 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341692] CPU: 5 PID: 4585 Comm: python3 Tainted: G O 6.5.0-1023-gcp #25~22.04.1-Ubuntu 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341694] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341695] RIP: 0010:__raw_callee_save___pv_queued_spin_unlock+0x10/0x1b 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341702] Code: cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 52 b8 01 00 00 00 31 d2 f0 0f b0 17 <3c> 01 75 07 5a 5d c3 cc cc cc cc 56 0f b6 f0 e8 9c ff ff ff 5e 5a 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341703] RSP: 0000:ffffb87c108dbce8 EFLAGS: 00000046 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341705] RAX: 0000000000000001 RBX: ffff8d5f0721b5d0 RCX: 0000000000000000 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341706] RDX: 0000000000000000 RSI: 0000000000000087 RDI: ffff8ddd3f7e33c0 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341707] RBP: ffffb87c108dbcf0 R08: 0000000000000087 R09: 0000000000000000 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341708] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ddd3f7e3400 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341708] R13: 0000000000000087 R14: 00000000ffffffff R15: ffff8d5f0721a148 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341709] FS: 00007cc62f5691c0(0000) GS:ffff8d9d3f740000(0000) knlGS:0000000000000000 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341710] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341711] CR2: 000056952c64a000 CR3: 000000010780c004 CR4: 00000000003706e0 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341715] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341719] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341720] Call Trace: 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341721] <NMI> 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341724] ? show_regs+0x6d/0x80 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341729] ? nmi_cpu_backtrace+0xb5/0x120 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341732] ? nmi_cpu_backtrace_handler+0x11/0x20 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341736] ? nmi_handle+0x5f/0x160 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341738] ? default_do_nmi+0x47/0x160 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341739] ? exc_nmi+0x1d5/0x2a0 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341741] ? end_repeat_nmi+0x16/0x67 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341747] ? __raw_callee_save___pv_queued_spin_unlock+0x10/0x1b 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341749] ? __raw_callee_save___pv_queued_spin_unlock+0x10/0x1b 2024-07-01 03:03:05.492 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341750] ? __raw_callee_save___pv_queued_spin_unlock+0x10/0x1b 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341752] </NMI> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341752] <TASK> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341753] _raw_spin_unlock_irqrestore+0x11/0x40 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341754] hrtimer_try_to_cancel.part.0+0x55/0xe0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341757] hrtimer_cancel+0x21/0x50 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341760] vkms_disable_vblank+0x15/0x20 [vkms] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341765] drm_vblank_disable_and_save+0xdf/0x120 [drm] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341813] vblank_disable_fn+0x74/0xa0 [drm] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341841] ? __pfx_vblank_disable_fn+0x10/0x10 [drm] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341867] call_timer_fn+0x29/0x130 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341870] __run_timers.part.0+0x20e/0x290 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341872] ? ktime_get+0x43/0xc0 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341874] ? __pfx_tick_sched_timer+0x10/0x10 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341876] ? native_apic_msr_write+0x2b/0x70 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341878] ? lapic_next_event+0x1d/0x30 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341880] ? clockevents_program_event+0xb3/0x140 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341883] run_timer_softirq+0x2a/0x60 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341884] __do_softirq+0xd9/0x30f 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341886] ? hrtimer_interrupt+0x11f/0x250 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341888] __irq_exit_rcu+0x75/0xa0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341889] irq_exit_rcu+0xe/0x20 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341891] sysvec_apic_timer_interrupt+0x40/0xd0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341893] asm_sysvec_apic_timer_interrupt+0x1b/0x20 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341895] RIP: 0033:0x7cc62eee1075 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341914] Code: 45 31 f6 45 31 ff 4c 89 6c 24 08 4d 89 f5 4c 8b 34 24 48 89 5c 24 18 48 89 cb 48 89 6c 24 10 4c 89 fd 49 89 c7 48 8b 44 24 48 <48> 8d 15 86 93 00 00 be 01 00 00 00 4c 89 e7 42 8b 0c 28 31 c0 e8 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341915] RSP: 002b:00007ffea68a7680 EFLAGS: 00000287 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341916] RAX: 000056952c64ac90 RBX: 0000000000000026 RCX: 0000000000000001 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341917] RDX: 0000000000000000 RSI: 00007cc62eeea405 RDI: 00007ffea68a7120 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341918] RBP: 0000000000000017 R08: 0000000000000004 R09: 0000000000001971 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341918] R10: 0000000000000000 R11: 0000000000000000 R12: 00005696464e4b80 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341919] R13: 000000000000005c R14: 0000000000000004 R15: 0000000000000027 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341920] </TASK> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342675] Sending NMI from CPU 36 to CPUs 23: 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342711] NMI backtrace for cpu 23 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342716] CPU: 23 PID: 0 Comm: swapper/23 Tainted: G O 6.5.0-1023-gcp #25~22.04.1-Ubuntu 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342719] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342720] RIP: 0010:native_halt+0xa/0x10 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342728] Code: c0 45 31 c9 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d 49 05 35 01 f4 <c3> cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342729] RSP: 0018:ffffb87c0cbc0e00 EFLAGS: 00000046 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342731] RAX: 0000000000000003 RBX: ffff8d5f0721a148 RCX: 0000000000000000 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342732] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8d5f0721a148 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342733] RBP: ffffb87c0cbc0e08 R08: 0000000000000003 R09: ffff8d5f0721a148 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342734] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000600000 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342735] R13: 0000000000000001 R14: 0000000000000100 R15: ffff8ddd3f7f3c40 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342736] FS: 0000000000000000(0000) GS:ffff8ddd3f7c0000(0000) knlGS:0000000000000000 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342738] CR2: 000000c0006d9740 CR3: 0000004048f06002 CR4: 00000000003706e0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342741] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342742] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342742] Call Trace: 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342744] <NMI> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342747] ? show_regs+0x6d/0x80 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342752] ? nmi_cpu_backtrace+0xb5/0x120 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342755] ? nmi_cpu_backtrace_handler+0x11/0x20 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342760] ? nmi_handle+0x5f/0x160 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342761] ? default_do_nmi+0x47/0x160 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342763] ? exc_nmi+0x1d5/0x2a0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342764] ? end_repeat_nmi+0x16/0x67 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342771] ? native_halt+0xa/0x10 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342772] ? native_halt+0xa/0x10 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342773] ? native_halt+0xa/0x10 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342775] </NMI> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342775] <IRQ> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342775] ? kvm_wait.part.0+0x97/0xc0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342777] kvm_wait+0x20/0x40 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342778] __pv_queued_spin_lock_slowpath+0x32b/0x380 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342781] _raw_spin_lock+0x38/0x60 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342784] drm_handle_vblank+0x6e/0x200 [drm] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342834] drm_crtc_handle_vblank+0x17/0x30 [drm] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342861] vkms_vblank_simulate+0x68/0x170 [vkms] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342867] ? __pfx_vkms_vblank_simulate+0x10/0x10 [vkms] 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342870] __hrtimer_run_queues+0x10f/0x250 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342873] ? clockevents_program_event+0xb3/0x140 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342876] hrtimer_interrupt+0xf6/0x250 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342878] __sysvec_apic_timer_interrupt+0x5c/0x100 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342880] sysvec_apic_timer_interrupt+0x8d/0xd0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342882] </IRQ> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342883] <TASK> 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342883] asm_sysvec_apic_timer_interrupt+0x1b/0x20 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342885] RIP: 0010:pv_native_safe_halt+0xb/0x10 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342887] Code: 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d 89 a3 2d 00 fb f4 <c3> cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342888] RSP: 0018:ffffb87c001c3db0 EFLAGS: 00000246 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342889] RAX: 0000000000004000 RBX: ffff8d5f0792d864 RCX: 0000000000000000 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342890] RDX: 0000000000000001 RSI: ffff8d5f0792d800 RDI: 0000000000000001 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342891] RBP: ffffb87c001c3db8 R08: 0000000000000000 R09: 0000000000000000 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342891] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d5f0792d864 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342892] R13: 0000000000000017 R14: ffffffffb4ce6560 R15: ffff8ddd3f7c0000 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342903] ? acpi_safe_halt+0x19/0x60 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342905] acpi_idle_do_entry+0x40/0x80 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342907] acpi_idle_enter+0xb6/0x180 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342910] cpuidle_enter_state+0x8e/0x6f0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342912] cpuidle_enter+0x2e/0x50 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342916] call_cpuidle+0x23/0x60 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342918] cpuidle_idle_call+0x11d/0x190 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342921] do_idle+0x82/0xf0 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342922] cpu_startup_entry+0x2a/0x30 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342924] start_secondary+0x129/0x160 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342926] secondary_startup_64_no_verify+0x190/0x19b 2024-07-01 03:03:05.824 Jul 1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342931] </TASK>
I fixed an issue with chunking and it looks working now. Also, I switched to a 256GB machine and it takes only 50% of memory with 100M chunk size.
https://firefox-ci-tc.services.mozilla.com/tasks/BaqT4V6ORTeQR66ycxy7KA/runs/1
It doesn't look like OOM based on GCP dashboards. Also, I switched to chunking on 100M sentences to reduce memory and it still fails.
We do not see this problem for other languages so far including uk-en. It might be a language-pair specific issue in eflomal.