Closed dorjoy03 closed 2 months ago
We already discussed this change here: https://github.com/rust-vmm/vhost-device/issues/458
The VhostUserBackend::max_queue_size
is used to allocate structures, so increasing it will consume more memory.
1024 seems very big, so we should not set it as default.
As we discussed, the best way should be to add a parameter to control it.
We already discussed this change here: #458
The
VhostUserBackend::max_queue_size
is used to allocate structures, so increasing it will consume more memory. 1024 seems very big, so we should not set it as default. As we discussed, the best way should be to add a parameter to control it.
So, I'd suggest a new queue-size
parameter, something like this:
./target/release/vhost-device-vsock --vm guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket,queue-size=1024
@stefano-garzarella +1 on giving an argument instead. By the way have you measured how much extra memory it'd need? Maybe it's negligible.
@stefano-garzarella +1 on giving an argument instead. By the way have you measured how much extra memory it'd need? Maybe it's negligible.
@epilys this is a good point. I didn't measure it. Do you suggest a proper way? (for generic exec I used smem
in the past, not sure if for rust we have some analyzing tool)
@stefano-garzarella I think smem should be enough but there's also bytehound, heaptrack
PID User Command Swap USS PSS RSS
# before this PR
$ smem | grep vhost
86936 stefano ./target/release/vhost-devi 0 1596 1644 3640
# after this PR
$ smem | grep vhost
87359 stefano ./target/release/vhost-devi 0 1704 1753 3768
USS is 108 KB higher, so around 6.8% more. Given that it's 1.7 MB in the end (vs 1.6 MB), maybe we can use 1024 by default, but still add a parameter for those use cases where we want to reduce it?
PID User Command Swap USS PSS RSS # before this PR $ smem | grep vhost 86936 stefano ./target/release/vhost-devi 0 1596 1644 3640 # after this PR $ smem | grep vhost 87359 stefano ./target/release/vhost-devi 0 1704 1753 3768
USS is 108 KB higher, so around 6.8% more. Given that it's 1.7 MB in the end (vs 1.6 MB), maybe we can use 1024 by default, but still add a parameter for those use cases where we want to reduce it?
Yeah agree that a new parameter makes sense. I will try to add the new parameter and keep 1024 as default. Thanks!
@stefano-garzarella @epilys Thanks for reviewing.
I have now introduced a "queue-size" option and set the default queue size to 1024 in the latest changes. I don't have any experience in Rust but the changes are simple enough (I mostly followed tx_buffer_size). If you have any review comments, it would be great if you could give me code examples too in the comment if the change is not obvious. Thanks!
Edit: Unrelated to this discussion, it could probably make sense to make the vhost-device-vsock error messages more exact. I had a hard time finding out where the "invalid parameters" error was coming from. It would have been more helpful if the error message had more details that it was when trying to set a queue size 1024 but max queue size was 256.
Edit: Unrelated to this discussion, it could probably make sense to make the vhost-device-vsock error messages more exact. I had a hard time finding out where the "invalid parameters" error was coming from. It would have been more helpful if the error message had more details that it was when trying to set a queue size 1024 but max queue size was 256.
A lot of the errors and log messages are indeed not as descriptive as they could be. If you want to change this particular case we'd love a PR! Otherwise we can open a ticket and someone will get to it eventually :)
Edit: Unrelated to this discussion, it could probably make sense to make the vhost-device-vsock error messages more exact. I had a hard time finding out where the "invalid parameters" error was coming from. It would have been more helpful if the error message had more details that it was when trying to set a queue size 1024 but max queue size was 256.
A lot of the errors and log messages are indeed not as descriptive as they could be. If you want to change this particular case we'd love a PR! Otherwise we can open a ticket and someone will get to it eventually :)
I probably won't look into it now so I suggest opening a ticket. Thanks.
Edit: Unrelated to this discussion, it could probably make sense to make the vhost-device-vsock error messages more exact. I had a hard time finding out where the "invalid parameters" error was coming from. It would have been more helpful if the error message had more details that it was when trying to set a queue size 1024 but max queue size was 256.
A lot of the errors and log messages are indeed not as descriptive as they could be. If you want to change this particular case we'd love a PR! Otherwise we can open a ticket and someone will get to it eventually :)
I probably won't look into it now so I suggest opening a ticket. Thanks.
@dorjoy03 can you open it and describe the issue?
The PR LGTM, but I'd remove Fixes: #202
since other devices need to be updated too. Maybe we can just mention it.
Also, what about splitting the commit in two? The first one you introduce the new parameter, leaving the default to 256. The second commit will increase it to 1024 explaining why. So in the future, if we need to return to 256, we can just revert the second patch.
Edit: Unrelated to this discussion, it could probably make sense to make the vhost-device-vsock error messages more exact. I had a hard time finding out where the "invalid parameters" error was coming from. It would have been more helpful if the error message had more details that it was when trying to set a queue size 1024 but max queue size was 256.
A lot of the errors and log messages are indeed not as descriptive as they could be. If you want to change this particular case we'd love a PR! Otherwise we can open a ticket and someone will get to it eventually :)
I probably won't look into it now so I suggest opening a ticket. Thanks.
@dorjoy03 can you open it and describe the issue?
No problem. Will do.
The PR LGTM, but I'd remove
Fixes: #202
since other devices need to be updated too. Maybe we can just mention it.
Good point. I did not realize the ticket was about other daemons as well. Will remove.
Also, what about splitting the commit in two? The first one you introduce the new parameter, leaving the default to 256. The second commit will increase it to 1024 explaining why. So in the future, if we need to return to 256, we can just revert the second patch.
Yep, will split into 2.
@stefano-garzarella I have now split the changes into 2 commits and submitted an issue about making the error logs more descriptive https://github.com/rust-vmm/vhost-device/issues/682
@epilys your "Approve" is still valid for GH, but I'd like to double-check before merge this. Is this version okay for you?
@stefano-garzarella was already on it, thanks!
@dorjoy03 thanks for this PR!
Summary of the PR
Please summarize here why the changes in this PR are needed.
When running "vhost-device-vsock" against QEMU's "vhost-user-vsock-device" (i.e., mmio version, not the pci one) causes an error output on the vhost-device-vsock terminal:
And some errors on the QEMU terminal too like below:
I debugged this on the QEMU side and traced it back to trying to set the vring queue size to 1024 and I think the "InvalidParameter" error comes from the "set_vring_num" function in vhost-device-vsock. The commands to reproduce this are below:
shell1:
./target/release/vhost-device-vsock --vm guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket
shell2:
./qemu-system-x86_64 -M microvm,memory-backend=mem0 -kernel ./bzImage -nographic -append "console=ttyS0 nokaslr" -initrd ./initramfs.cpio.gz -m 4G --enable-kvm -cpu host -object memory-backend-memfd,id=mem0,size=4G -chardev socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device vhost-user-vsock-device,chardev=char0
I built the bzImage from linux kernel 6.9.3. From what I understand, in the linux driver side in "drivers/virtio/virtio_mmio.c" the "vm_setup_vq" function first reads the max queue size (VIRTIO_MMIO_QUEUE_NUM_MAX) and then sets it to be the queue size (VIRTIO_MMIO_QUEUE_NUM). In the QEMU side, the max queue size is in "include/hw/virtio/virtio.h" "VIRTQUEUE_MAX_SIZE" which is 1024. I think it makes sense to just increase the max queue size in vhost-device-vsock too. The errors go away if I do so.
Edit: Per review comments, a new queue-size option is introduced and the default value of queue size has been changed to 1024
Requirements
Before submitting your PR, please make sure you addressed the following requirements:
git commit -s
), and the commit message has max 60 characters for the summary and max 75 characters for each description line.unsafe
code is properly documented.