Closed pigmej closed 1 year ago
[mvk-error] VK_ERROR_DEVICE_LOST: MTLCommandBuffer "vkQueueSubmit CommandBuffer on Queue 3-0" execution failed (code 1): Internal Error (0000000e:Internal Error)
that's an error there.
Bigger log file err.txt
Short log above. Clearly visible what crashes and that when it crashes that we start to generate 0 everywhere. It's also visible that already the first file is corrupted. It happens in different moments etc.
That's the "device lost" https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#devsandqueues-lost-device
That's the exact sequence:
[mvk-error] VK_ERROR_DEVICE_LOST: MTLCommandBuffer "vkQueueSubmit CommandBuffer on Queue 3-0" execution failed (code 1): Internal Error (0000000e:Internal Error)
2023-03-30T18:09:39.809+0200 INFO 0a0c7.app.0a0c7.post initialization: file #5 completed; number of labels written: 15360 {"node_id": "0a0c7f128ed9507ee3d8cc05b613d9800dde44a3a530d5db74ee06e93a904f04", "module": "app", "node_id": "0a0c7f128ed9507ee3d8cc05b613d9800dde44a3a530d5db74ee06e93a904f04", "module": "post"}
2023-03-30T18:09:39.809+0200 INFO 0a0c7.app.0a0c7.post post setup completed {"node_id": "0a0c7f128ed9507ee3d8cc05b613d9800dde44a3a530d5db74ee06e93a904f04", "module": "app", "node_id": "0a0c7f128ed9507ee3d8cc05b613d9800dde44a3a530d5db74ee06e93a904f04", "module": "post", "node_id": "0a0c7f128ed9507ee3d8cc05b613d9800dde44a3a530d5db74ee06e93a904f04", "commitment_atx": "dc8053f637", "data_dir": "../pos_data/0", "num_units": "5", "labels_per_unit": "1310720", "provider": "1", "name": "post"}
2023-03-30T18:09:39.809+0200 ERROR 0a0c7.app.0a0c7.atxBuilder Failed to generate proof: %!w(*fmt.wrapError=&{post execution: generate proof: not completed 0x1400109f700}){"node_id": "0a0c7f128ed9507ee3d8cc05b613d9800dde44a3a530d5db74ee06e93a904f04", "module": "app", "node_id": "0a0c7f128ed9507ee3d8cc05b613d9800dde44a3a530d5db74ee06e93a904f04", "module": "atxBuilder"}
m2 pro behaves the same way, on m1 max did not reproduced the error.
This one is connected to: https://github.com/spacemeshos/gpu-post/issues/93
Most of the stuff was fixed there are some gpus misbehaving but let's keep them as separate issues.
Environment is macbook pro M1
Consider following logs:
When I restart the node it starts to initailize again with the following log
The network config is:
node config for smashing is:
before the first log what happens is: