mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Student alignments fail for en-uk #721

Closed eu9ene closed 3 months ago

eu9ene commented 3 months ago

https://firefox-ci-tc.services.mozilla.com/tasks/BaqT4V6ORTeQR66ycxy7KA/runs/1

It doesn't look like OOM based on GCP dashboards. Also, I switched to chunking on 100M sentences to reduce memory and it still fails.

We do not see this problem for other languages so far including uk-en. It might be a language-pair specific issue in eflomal.

2024-07-01 02:16:19.086
Jul  1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq fstrim[7148]: /boot/efi: 98.3 MiB (103061504 bytes) trimmed on /dev/sda15
2024-07-01 02:16:19.086
Jul  1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq fstrim[7148]: /: 515 GiB (553010765824 bytes) trimmed on /dev/sda1
2024-07-01 02:16:19.086
Jul  1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: fstrim.service: Deactivated successfully.
2024-07-01 02:16:19.087
Jul  1 01:16:19 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished Discard unused blocks on filesystems from /etc/fstab.
2024-07-01 02:16:20.788
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 About to reclaim task BaqT4V6ORTeQR66ycxy7KA...
2024-07-01 02:16:20.789
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA...
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaimed task BaqT4V6ORTeQR66ycxy7KA successfully.
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Notifying listener taskcluster-proxy of state change
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Received task status change: Reclaimed
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Got http status code 200 when issuing PUT to http://localhost:80/credentials with clientId task-client/BaqT4V6ORTeQR66ycxy7KA/0/on/us-west1-b/3862006725621225983/until/1719797780.822
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Successfully reclaimed task BaqT4V6ORTeQR66ycxy7KA
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA at 2024-07-01 01:33:20.822 +0000 UTC
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Current task claim expires at 2024-07-01 01:36:20.822 +0000 UTC
2024-07-01 02:16:20.837
Jul  1 01:16:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:16:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA in 16m59.984514843s
2024-07-01 02:17:01.802
Jul  1 01:17:01 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq CRON[7154]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
2024-07-01 02:20:25.378
{"code":"LogPingOpsAgent"}
2024-07-01 02:25:18.194
Jul  1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh...
2024-07-01 02:25:18.216
Jul  1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully.
2024-07-01 02:25:18.216
Jul  1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7160]: 2024/07/01 01:25:18: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404
2024-07-01 02:25:18.216
Jul  1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7160]: 2024/07/01 01:25:18: Done
2024-07-01 02:25:18.216
Jul  1 01:25:18 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh.
2024-07-01 02:30:25.378
{"code":"LogPingOpsAgent"}
2024-07-01 02:33:20.823
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 About to reclaim task BaqT4V6ORTeQR66ycxy7KA...
2024-07-01 02:33:20.849
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA...
2024-07-01 02:33:20.938
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaimed task BaqT4V6ORTeQR66ycxy7KA successfully.
2024-07-01 02:33:20.938
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Notifying listener taskcluster-proxy of state change
2024-07-01 02:33:20.938
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Received task status change: Reclaimed
2024-07-01 02:33:20.939
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Got http status code 200 when issuing PUT to http://localhost:80/credentials with clientId task-client/BaqT4V6ORTeQR66ycxy7KA/0/on/us-west1-b/3862006725621225983/until/1719798800.923
2024-07-01 02:33:20.939
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Successfully reclaimed task BaqT4V6ORTeQR66ycxy7KA
2024-07-01 02:33:20.939
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA at 2024-07-01 01:50:20.923 +0000 UTC
2024-07-01 02:33:20.939
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Current task claim expires at 2024-07-01 01:53:20.923 +0000 UTC
2024-07-01 02:33:20.939
Jul  1 01:33:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:33:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA in 16m59.98398485s
2024-07-01 02:35:28.030
Jul  1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh...
2024-07-01 02:35:28.041
Jul  1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7175]: 2024/07/01 01:35:28: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404
2024-07-01 02:35:28.041
Jul  1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7175]: 2024/07/01 01:35:28: Done
2024-07-01 02:35:28.042
Jul  1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully.
2024-07-01 02:35:28.042
Jul  1 01:35:28 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh.
2024-07-01 02:40:25.377
{"code":"LogPingOpsAgent"}
2024-07-01 02:45:39.446
Jul  1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh...
2024-07-01 02:45:39.477
Jul  1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully.
2024-07-01 02:45:39.477
Jul  1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7191]: 2024/07/01 01:45:39: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404
2024-07-01 02:45:39.477
Jul  1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7191]: 2024/07/01 01:45:39: Done
2024-07-01 02:45:39.477
Jul  1 01:45:39 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh.
2024-07-01 02:49:34.071
Jul  1 01:49:34 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:49:34.070Z] Final argmax iteration: 15890.433 s#015
2024-07-01 02:49:34.100
Jul  1 01:49:34 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:49:34.070Z] Writing alignments to /builds/worker/artifacts/tmp/aln.fwd.aa for 100000000 sentencess#015
2024-07-01 02:50:20.924
Jul  1 01:50:20 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 About to reclaim task BaqT4V6ORTeQR66ycxy7KA...
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA...
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaimed task BaqT4V6ORTeQR66ycxy7KA successfully.
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Notifying listener taskcluster-proxy of state change
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Received task status change: Reclaimed
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Got http status code 200 when issuing PUT to http://localhost:80/credentials with clientId task-client/BaqT4V6ORTeQR66ycxy7KA/0/on/us-west1-b/3862006725621225983/until/1719799820.983
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Successfully reclaimed task BaqT4V6ORTeQR66ycxy7KA
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA at 2024-07-01 02:07:20.983 +0000 UTC
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Current task claim expires at 2024-07-01 02:10:20.983 +0000 UTC
2024-07-01 02:50:21.011
Jul  1 01:50:21 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq start-worker[1252]: 2024/07/01 01:50:20 Reclaiming task BaqT4V6ORTeQR66ycxy7KA in 16m59.985546732s
2024-07-01 02:50:25.378
{"code":"LogPingOpsAgent"}
2024-07-01 02:52:15.558
Jul  1 01:52:15 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:52:15.557Z] [alignments] Processing part ab#015
2024-07-01 02:52:16.087
Jul  1 01:52:16 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq conmon[4146]: [task 2024-07-01T01:52:15.557Z] [alignments] Calculating alignments...#015
2024-07-01 02:56:10.674
Jul  1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Starting GCE Workload Certificate refresh...
2024-07-01 02:56:10.706
Jul  1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully.
2024-07-01 02:56:10.706
Jul  1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7214]: 2024/07/01 01:56:10: Error getting config status, workload certificates may not be configured: failed to GET "instance/gce-workload-certificates/config-status" from MDS with error: error connecting to metadata server, status code: 404
2024-07-01 02:56:10.706
Jul  1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq gce_workload_cert_refresh[7214]: 2024/07/01 01:56:10: Done
2024-07-01 02:56:10.706
Jul  1 01:56:10 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq systemd[1]: Finished GCE Workload Certificate refresh.
2024-07-01 03:00:25.377
{"code":"LogPingOpsAgent"}
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.273577] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.279641] rcu:   5-...0: (7 ticks this GP) idle=46ac/1/0x4000000000000000 softirq=130361/130363 fqs=7168
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.289317] rcu:            hardirqs   softirqs   csw/system
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.294985] rcu:    number:        0          0            0
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.300665] rcu:   cputime:        0          0            0   ==> 30028(ms)
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.307741] rcu:   23-...0: (3 GPs behind) idle=ed14/0/0x1 softirq=85779/85779 fqs=7169
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.315755] rcu:            hardirqs   softirqs   csw/system
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.321425] rcu:    number:        0          0            0
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.327094] rcu:   cputime:        0          0            0   ==> 30052(ms)
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.334181] rcu:   (detected by 36, t=15017 jiffies, g=1443741, q=15967 ncpus=64)
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341677] Sending NMI from CPU 36 to CPUs 5:
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341688] NMI backtrace for cpu 5
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341692] CPU: 5 PID: 4585 Comm: python3 Tainted: G           O       6.5.0-1023-gcp #25~22.04.1-Ubuntu
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341694] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341695] RIP: 0010:__raw_callee_save___pv_queued_spin_unlock+0x10/0x1b
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341702] Code: cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 52 b8 01 00 00 00 31 d2 f0 0f b0 17 <3c> 01 75 07 5a 5d c3 cc cc cc cc 56 0f b6 f0 e8 9c ff ff ff 5e 5a
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341703] RSP: 0000:ffffb87c108dbce8 EFLAGS: 00000046
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341705] RAX: 0000000000000001 RBX: ffff8d5f0721b5d0 RCX: 0000000000000000
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341706] RDX: 0000000000000000 RSI: 0000000000000087 RDI: ffff8ddd3f7e33c0
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341707] RBP: ffffb87c108dbcf0 R08: 0000000000000087 R09: 0000000000000000
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341708] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ddd3f7e3400
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341708] R13: 0000000000000087 R14: 00000000ffffffff R15: ffff8d5f0721a148
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341709] FS:  00007cc62f5691c0(0000) GS:ffff8d9d3f740000(0000) knlGS:0000000000000000
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341710] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341711] CR2: 000056952c64a000 CR3: 000000010780c004 CR4: 00000000003706e0
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341715] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341719] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341720] Call Trace:
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341721]  <NMI>
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341724]  ? show_regs+0x6d/0x80
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341729]  ? nmi_cpu_backtrace+0xb5/0x120
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341732]  ? nmi_cpu_backtrace_handler+0x11/0x20
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341736]  ? nmi_handle+0x5f/0x160
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341738]  ? default_do_nmi+0x47/0x160
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341739]  ? exc_nmi+0x1d5/0x2a0
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341741]  ? end_repeat_nmi+0x16/0x67
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341747]  ? __raw_callee_save___pv_queued_spin_unlock+0x10/0x1b
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341749]  ? __raw_callee_save___pv_queued_spin_unlock+0x10/0x1b
2024-07-01 03:03:05.492
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341750]  ? __raw_callee_save___pv_queued_spin_unlock+0x10/0x1b
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341752]  </NMI>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341752]  <TASK>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341753]  _raw_spin_unlock_irqrestore+0x11/0x40
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341754]  hrtimer_try_to_cancel.part.0+0x55/0xe0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341757]  hrtimer_cancel+0x21/0x50
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341760]  vkms_disable_vblank+0x15/0x20 [vkms]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341765]  drm_vblank_disable_and_save+0xdf/0x120 [drm]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341813]  vblank_disable_fn+0x74/0xa0 [drm]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341841]  ? __pfx_vblank_disable_fn+0x10/0x10 [drm]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341867]  call_timer_fn+0x29/0x130
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341870]  __run_timers.part.0+0x20e/0x290
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341872]  ? ktime_get+0x43/0xc0
03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341874]  ? __pfx_tick_sched_timer+0x10/0x10
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341876]  ? native_apic_msr_write+0x2b/0x70
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341878]  ? lapic_next_event+0x1d/0x30
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341880]  ? clockevents_program_event+0xb3/0x140
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341883]  run_timer_softirq+0x2a/0x60
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341884]  __do_softirq+0xd9/0x30f
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341886]  ? hrtimer_interrupt+0x11f/0x250
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341888]  __irq_exit_rcu+0x75/0xa0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341889]  irq_exit_rcu+0xe/0x20
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341891]  sysvec_apic_timer_interrupt+0x40/0xd0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341893]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341895] RIP: 0033:0x7cc62eee1075
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341914] Code: 45 31 f6 45 31 ff 4c 89 6c 24 08 4d 89 f5 4c 8b 34 24 48 89 5c 24 18 48 89 cb 48 89 6c 24 10 4c 89 fd 49 89 c7 48 8b 44 24 48 <48> 8d 15 86 93 00 00 be 01 00 00 00 4c 89 e7 42 8b 0c 28 31 c0 e8
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341915] RSP: 002b:00007ffea68a7680 EFLAGS: 00000287
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341916] RAX: 000056952c64ac90 RBX: 0000000000000026 RCX: 0000000000000001
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341917] RDX: 0000000000000000 RSI: 00007cc62eeea405 RDI: 00007ffea68a7120
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341918] RBP: 0000000000000017 R08: 0000000000000004 R09: 0000000000001971
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341918] R10: 0000000000000000 R11: 0000000000000000 R12: 00005696464e4b80
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341919] R13: 000000000000005c R14: 0000000000000004 R15: 0000000000000027
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.341920]  </TASK>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342675] Sending NMI from CPU 36 to CPUs 23:
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342711] NMI backtrace for cpu 23
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342716] CPU: 23 PID: 0 Comm: swapper/23 Tainted: G           O       6.5.0-1023-gcp #25~22.04.1-Ubuntu
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342719] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342720] RIP: 0010:native_halt+0xa/0x10
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342728] Code: c0 45 31 c9 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d 49 05 35 01 f4 <c3> cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342729] RSP: 0018:ffffb87c0cbc0e00 EFLAGS: 00000046
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342731] RAX: 0000000000000003 RBX: ffff8d5f0721a148 RCX: 0000000000000000
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342732] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8d5f0721a148
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342733] RBP: ffffb87c0cbc0e08 R08: 0000000000000003 R09: ffff8d5f0721a148
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342734] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000600000
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342735] R13: 0000000000000001 R14: 0000000000000100 R15: ffff8ddd3f7f3c40
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342736] FS:  0000000000000000(0000) GS:ffff8ddd3f7c0000(0000) knlGS:0000000000000000
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342737] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342738] CR2: 000000c0006d9740 CR3: 0000004048f06002 CR4: 00000000003706e0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342741] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342742] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342742] Call Trace:
03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342744]  <NMI>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342747]  ? show_regs+0x6d/0x80
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342752]  ? nmi_cpu_backtrace+0xb5/0x120
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342755]  ? nmi_cpu_backtrace_handler+0x11/0x20
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342760]  ? nmi_handle+0x5f/0x160
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342761]  ? default_do_nmi+0x47/0x160
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342763]  ? exc_nmi+0x1d5/0x2a0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342764]  ? end_repeat_nmi+0x16/0x67
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342771]  ? native_halt+0xa/0x10
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342772]  ? native_halt+0xa/0x10
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342773]  ? native_halt+0xa/0x10
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342775]  </NMI>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342775]  <IRQ>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342775]  ? kvm_wait.part.0+0x97/0xc0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342777]  kvm_wait+0x20/0x40
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342778]  __pv_queued_spin_lock_slowpath+0x32b/0x380
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342781]  _raw_spin_lock+0x38/0x60
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342784]  drm_handle_vblank+0x6e/0x200 [drm]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342834]  drm_crtc_handle_vblank+0x17/0x30 [drm]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342861]  vkms_vblank_simulate+0x68/0x170 [vkms]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342867]  ? __pfx_vkms_vblank_simulate+0x10/0x10 [vkms]
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342870]  __hrtimer_run_queues+0x10f/0x250
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342873]  ? clockevents_program_event+0xb3/0x140
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342876]  hrtimer_interrupt+0xf6/0x250
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342878]  __sysvec_apic_timer_interrupt+0x5c/0x100
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342880]  sysvec_apic_timer_interrupt+0x8d/0xd0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342882]  </IRQ>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342883]  <TASK>
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342883]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342885] RIP: 0010:pv_native_safe_halt+0xb/0x10
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342887] Code: 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d 89 a3 2d 00 fb f4 <c3> cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342888] RSP: 0018:ffffb87c001c3db0 EFLAGS: 00000246
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342889] RAX: 0000000000004000 RBX: ffff8d5f0792d864 RCX: 0000000000000000
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342890] RDX: 0000000000000001 RSI: ffff8d5f0792d800 RDI: 0000000000000001
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342891] RBP: ffffb87c001c3db8 R08: 0000000000000000 R09: 0000000000000000
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342891] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d5f0792d864
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342892] R13: 0000000000000017 R14: ffffffffb4ce6560 R15: ffff8ddd3f7c0000
03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342903]  ? acpi_safe_halt+0x19/0x60
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342905]  acpi_idle_do_entry+0x40/0x80
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342907]  acpi_idle_enter+0xb6/0x180
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342910]  cpuidle_enter_state+0x8e/0x6f0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342912]  cpuidle_enter+0x2e/0x50
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342916]  call_cpuidle+0x23/0x60
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342918]  cpuidle_idle_call+0x11d/0x190
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342921]  do_idle+0x82/0xf0
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342922]  cpu_startup_entry+0x2a/0x30
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342924]  start_secondary+0x129/0x160
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342926]  secondary_startup_64_no_verify+0x190/0x19b
2024-07-01 03:03:05.824
Jul  1 02:03:05 translations-1-b-linux-large-gcp-1tb-6-bbm1-khvtf6k4jwgd5ebgq kernel: [48787.342931]  </TASK>
eu9ene commented 3 months ago

I fixed an issue with chunking and it looks working now. Also, I switched to a 256GB machine and it takes only 50% of memory with 100M chunk size.