nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.66k stars 137 forks source link

Firecracker: Snapshot load causes a lot of firecracker virtio block error logs #2080

Open jeromegn opened 4 days ago

jeromegn commented 4 days ago

I don't know if these logs are important because my app appears to be working properly (though it isn't accessing the filesystem post-snapshot-load).

2024-11-20T12:07:50.095125693 [anonymous-instance:main] Failed to execute Out virtio block request: FileEngine(Sync(Transfer(IOError(Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" }))))
2024-11-20T12:07:50.095131604 [anonymous-instance:main] Failed to execute Out virtio block request: FileEngine(Sync(Transfer(IOError(Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" }))))
2024-11-20T12:07:50.095137595 [anonymous-instance:main] Failed to execute Out virtio block request: FileEngine(Sync(Transfer(IOError(Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" }))))

I haven't seen anybody else report these errors to the firecracker repository so I am assuming it is something specific to nanos. I'm happy to make a firecracker issue as well if I am misguided.

To reproduce, I believe:

I see these logs before seeing any logs from my app:

2024-11-20T14:18:59.530639927 [anonymous-instance:main] Running Firecracker v1.10.1
2024-11-20T14:19:02.397687996 [anonymous-instance:fc_api] The API server received a Put request on "/snapshot/load" with body "{\n            \"snapshot_path\": \"./snapshot_file\",\n            \"mem_backend\": {\n                \"backend_path\": \"./mem_file\",\n                \"backend_type\": \"File\"\n            },\n            \"enable_diff_snapshots\": true,\n            \"resume_vm\": true\n    }".
2024-11-20T14:19:02.397716760 [anonymous-instance:main] [DevPreview] Virtual machine snapshots is in development preview.
2024-11-20T14:19:02.397852002 [anonymous-instance:main] Host CPU vendor ID: [65, 117, 116, 104, 101, 110, 116, 105, 99, 65, 77, 68]
2024-11-20T14:19:02.397856540 [anonymous-instance:main] Snapshot CPU vendor ID: [65, 117, 116, 104, 101, 110, 116, 105, 99, 65, 77, 68]
2024-11-20T14:19:02.407181046 [anonymous-instance:main] Artificially kick devices.
2024-11-20T14:19:02.407187508 [anonymous-instance:main] kick vsock vsock.
2024-11-20T14:19:02.407192928 [anonymous-instance:main] kick block rootfs.
2024-11-20T14:19:02.407195954 [anonymous-instance:main] kick net eth0.
2024-11-20T14:19:02.407227983 [anonymous-instance:fc_vcpu 0] Received a VcpuEvent::Resume message with immediate_exit enabled. immediate_exit was disabled before proceeding
2024-11-20T14:19:02.407245746 [anonymous-instance:main] [DevPreview] Virtual machine snapshots is in development preview - 'load snapshot' VMM action took 9520 us.
2024-11-20T14:19:02.407257158 [anonymous-instance:fc_api] The request was executed successfully. Status code: 204 No Content.
2024-11-20T14:19:02.407263109 [anonymous-instance:fc_api] 'load snapshot' API request took 9593 us.

Some artifacts:

Firecracker config ```json { "boot-source": { "kernel_image_path": "/home/jerome/.ops/0.1.52/kernel.img", "boot_args": "console=ttyS0 reboot=k panic=1 pci=off random.trust_cpu=on" }, "drives": [ { "drive_id": "rootfs", "path_on_host": "/home/jerome/.ops/images/app", "is_root_device": true, "is_read_only": true } ], "network-interfaces": [ { "iface_id": "eth0", "guest_mac": "AA:FC:00:00:00:01", "host_dev_name": "tap0" } ], "machine-config": { "vcpu_count": 1, "mem_size_mib": 512 }, "balloon": { "amount_mib": 0, "deflate_on_oom": false, "stats_polling_interval_s": 1 }, "logger": { "log_path": "logs.fifo", "level": "Debug" }, "metrics": { "metrics_path": "metrics.fifo" }, "vsock": { "guest_cid": 3, "uds_path": "./v.sock" } } ```
Deno main.ts ```typescript Deno.serve((_req) => { let mem = Deno.memoryUsage(); console.log(`rss: ${mem.rss}, heap total: ${mem.heapTotal}, heap used: ${mem.heapUsed}, external: ${mem.external}`); return fetch("https://debug.fly.dev") }); ```
Snapshotting + load ``` sudo curl --unix-socket app/fc.sock -i \ -X PATCH 'http://localhost/vm' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "state": "Paused" }' sudo curl --unix-socket app/fc.sock -i \ -X PUT 'http://localhost/snapshot/create' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "snapshot_type": "Full", "snapshot_path": "./snapshot_file", "mem_file_path": "./mem_file" }' ``` ``` sudo rm fc.sock v.sock; sudo firecracker-v1.10.1-x86_64 --api-sock fc.sock ``` ``` sudo curl --unix-socket app/fc.sock -i \ -X PUT 'http://localhost/snapshot/load' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "snapshot_path": "./snapshot_file", "mem_backend": { "backend_path": "./mem_file", "backend_type": "File" }, "enable_diff_snapshots": true, "resume_vm": true }' ```
francescolavra commented 1 day ago

Those virtio block errors appear to be unrelated to snapshotting, and are caused by Deno trying to write to the filesystem while the underlying drive is marked as read-only (by the "is_read_only": true option in your config file). Even though your app may not be accessing the filesystem directly, Deno tries to create its internal cache (at /root/.cache/deno/) and write files in there. If you want to prevent Deno from trying to write to the filesystem, you can mark the filesystem as read-only at the guest level, e.g. by adding readonly_rootfs=t (which is a Nanos-specific option) to the kernel command line (i.e. the boot_args string in the Firecracker config).