notro / gud

GUD USB Display
155 stars 8 forks source link

FunctionFS implementation #24

Open samcday opened 7 months ago

samcday commented 7 months ago

Hi there,

I've begun implementing the GUD device/gadget side bits in userland with FunctionFS. There's still a lot of work to do (EDIDs, connector/device properties are stubbed, modes are semi-stubbed), but it's already quite functional.

I can get a solid 30fps in fullscreen glxgears on an msm8916 device (testing with a Galaxy A5 and a Galaxy A Tab 9.7" running pmOS), with quite low (~15% incl. kworker) CPU utilization. Smaller damage clips (incl. cursor) run quite buttery smooth at 60fps.

Interestingly, I can't get better than ~35fps in fullscreen glxgears on my Steam Deck at 800x1280 (and there CPU usage is like 1-2%), and that's with the SET_BUFFER bulk recvs taking about ~1ms. So I guess maybe there's something else to track down there, or perhaps something to look for in the host side implementation?

Anyway, here's the repo: https://github.com/samcday/gud-gadget/

Thanks for shipping such an interesting project @notro!

Also a friendly ping to @ZenithalHourlyRate to say thanks for your gist (the way I first found the GUD project), and to let you know that there's now a FunctionFS gadget implementation (something you mentioned as a nice2have in your gist).

notro commented 7 months ago

I have never looked at Rust so I don't understand the details, but I'm impressed about how few lines of codes was needed to hack together a prototype.

It will be interesting to hear about performance numbers when/if you get that part sorted out. I have been wondering about how it will perform given the extra memcpy when going to userspace.

I've added a link to your project in the wiki: https://github.com/notro/gud/wiki/Gadget-Implementations#userspace-linux

samcday commented 7 months ago

I have been wondering about how it will perform given the extra memcpy when going to userspace.

I stumbled across a comment you made somewhere else to this effect before I started implementation, so I was concerned as well. In practice though, the basic benchmarking I've done thus far suggests that the LZ4 decompression on just about any buffer size is >2x slower than the memcpys.

I'm also not sure there's an "extra" memcpy in userspace, though? Reading from endpoints in FunctionFS is done with AIO. So grabbing the bytes from an endpoint FIFO is done by submitting buffers to the kernel.

Based on my understanding of your kernel mode gadget implementation, submitting a buffer with usb_ep_queue still requires a copy from FIFO -> buffer. So I think AIO and usb_ep_queue are equally efficient?

It's a little more complicated though, the implementation I have thus far is definitely not optimal in memory/buffer usage, since the current usb-gadget API (which I'm hoping can be improved) makes it tricky to submit AIO tasks with buffers pointing to a contiguous memory segment. So there is extra memcpys taking place to go from the data read from the bulk endpoint -> work buffer.

Without the LZ4 compression, and with changes to the usb-gadget crate API, I think it should be possible to read pixel data from the bulk FIFO directly into the drm framebuffer memory allocated by GBM. So that would be zero copy from USB endpoint -> screen. That would only work on USB interfaces with sufficient bandwidth of course.

I'll be sure let you know where it all ends up :)

I've added a link to your project in the wiki: https://github.com/notro/gud/wiki/Gadget-Implementations#userspace-linux

Thanks! :pray:

notro commented 7 months ago

AFAICT drivers/usb/gadget/function/f_fs.c does a copy_to_iter() from the received USB request buffer before giving it to userspace. The in-kernel version doesn't have that. But I don't know if it really matters in actual performance.

AIUI the reason io_uring was made was to avoid this memcpy when moving data between userspace and the kernel. I don't think uio has zero-copy.

ZenithalHourlyRate commented 6 months ago

Also a friendly ping to @ZenithalHourlyRate to say thanks for your gist (the way I first found the GUD project), and to let you know that there's now a FunctionFS gadget implementation (something you mentioned as a nice2have in your gist).

Very impressive! Glad to see that!