vpelletier / python-functionfs

Pythonic API for linux's functionfs
GNU General Public License v3.0
40 stars 13 forks source link

onDisable() delayed to a later event? #7

Closed monkey-jsun closed 6 years ago

monkey-jsun commented 6 years ago

Hi, Vincent,

I notice many times (always?) the onDisable() is not called until next onEnable() event happens. Is this known issue?

Thanks!

Jun

vpelletier commented 6 years ago

Hello,

is not called until next onEnable() event happens

I guess you are basing this from the output from examples/usbcat/device.py, and maybe not the most recent version.

Anyway, this has been bugging me too, and is why I've been calling onDisable (and later, _disable to give a less confusing output) from onEnable, so submitted AIOs get cleaned up before being re-submitted[1].

It's not clear to me whether it's a bug. At least, from looking at drivers/usb/gadget/function/f_fs.c and drivers/usb/gadget/composite.c history, it does not seem a recent change.

[1] I am not actually sure this is enough: io_cancel does not wait for actual completion. But I am stuck at the moment poking at a kernel panic which happens on dwc3 device controlers with that very example program: AIO block fail to be properly cancelled inside the kernel... When I get out of this I'll have to do some more cancellation cleanup in my example as well.

vpelletier commented 6 years ago

I rebuilt f_fs.c with #define VERBOSE_DEBUG uncommented, and enabled all dynamic debug logs in that file: # echo "file drivers/usb/gadget/function/f_fs.c +p" > /sys/kernel/debug/dynamic_debug/control.

Right after I do # echo > /sys/kernel/config/usb_gadget/g1/UDC to virtually unplug the gadget, I see:

[  396.983101] ffs_epfile_async_io_complete()
[  396.987405] ffs_epfile_async_io_complete()
[  396.991682] adding event 3
[  396.994708] ffs_func_unbind()
[  397.001979] functionfs_unbind()
[  397.007113] ffs_data_put()
[  397.011580] purging event 3
[  397.011592] adding event 1
[  397.039113] ffs_ep0_read()

Which means the kernel does emit event 3 (FUNCTIONFS_DISABLE), but then purges it immediately when producing event 1 (FUNCTIONFS_UNBIND). And ep0_read is triggered (by python-functionfs) only after all that.

Reading the code, it may be intentional: event queue is only 4-deep, SETUP, SUSPEND and RESUME can cohabit (for the latter 2, it requires the first event to be RESUME and SUSPEND to only arrive after). This means that there is only one slot left guaranteed, so any one of BIND, UNBIND, DISABLE and ENABLE flush the queue on arrival. So deepest queue possible looks something like: ENABLE,SETUP,RESUME,SUSPEND

monkey-jsun commented 6 years ago

I don't think # echo > /sys/kernel/config/usb_gadget/g1/UD is the same as unplugging a device. In the former case, the events are DISABLE followed by UNBIND, while in the later case it should only be DISABLE event. Right?

The debugging trick is cool. I will try it tonight.

Thanks as always.

On Mon, Nov 27, 2017 at 8:19 AM, Vincent Pelletier <notifications@github.com

wrote:

I rebuilt f_fs.c with #define VERBOSE_DEBUG uncommented, and enabled all dynamic debug logs in that file: # echo "file drivers/usb/gadget/function/ffs.c +p" > /sys/kernel/debug/dynamic debug/control.

Right after I do # echo > /sys/kernel/config/usb_gadget/g1/UDC to virtually unplug the gadget, I see:

[ 396.983101] ffs_epfile_async_io_complete() [ 396.987405] ffs_epfile_async_io_complete() [ 396.991682] adding event 3 [ 396.994708] ffs_func_unbind() [ 397.001979] functionfs_unbind() [ 397.007113] ffs_data_put() [ 397.011580] purging event 3 [ 397.011592] adding event 1 [ 397.039113] ffs_ep0_read()

Which means the kernel does emit event 3 (FUNCTIONFS_DISABLE), but then purges it immediately when producing event 1 (FUNCTIONFS_UNBIND). And ep0_read is triggered (by python-functionfs) only after all that.

Reading the code, it may be intentional: event queue is only 4-deep, SETUP, SUSPEND and RESUME can cohabit (for the latter 2, it requires the first event to be RESUME and SUSPEND to only arrive after). This means that there is only one slot left guaranteed, so any one of BIND, UNBIND, DISABLE and ENABLE flush the queue on arrival. So deepest queue possible looks something like: ENABLE,SETUP,RESUME,SUSPEND

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vpelletier/python-functionfs/issues/7#issuecomment-347234525, or mute the thread https://github.com/notifications/unsubscribe-auth/ALvYW_irdVlNFYb2SyP-GZPgYKu702RFks5s6uERgaJpZM4QrKsX .

monkey-jsun commented 6 years ago

I'm not sure how to read this, but when I unplug the USB cable, nothing is spit out from the kernel. For whatever it is worth, I listed the kernel dmesg below. Messages are separate into blocks where an unplug event happened.

BTW, I have an additional issue where after re-inserting the USB cable, read on endpoints return an error, saying IO operations on closed handle. What is the right way to refresh the ep handle onEnble(). Currently I just call self.getEndpoint(xxx)

=================================== [ 1343.391811] dwc2 20980000.usb: new address 5 [ 1343.566188] configfs-gadget gadget: high-speed config #1: c [ 1343.566243] adding event 2 [ 1343.588172] ffs_ep0_read()

[ 1526.530785] adding event 3 [ 1526.536439] ffs_ep0_read() [ 1526.585524] dwc2 20980000.usb: new device is high-speed [ 1526.679287] dwc2 20980000.usb: new device is high-speed [ 1526.715174] dwc2 20980000.usb: new address 23 [ 1526.970037] dwc2 20980000.usb: new device is high-speed [ 1527.061041] dwc2 20980000.usb: new device is high-speed [ 1527.104511] dwc2 20980000.usb: new address 43 [ 1527.292775] dwc2 20980000.usb: new device is high-speed [ 1527.383901] dwc2 20980000.usb: new device is high-speed [ 1527.427400] dwc2 20980000.usb: new address 45 [ 1527.599774] configfs-gadget gadget: high-speed config #1: c [ 1527.599829] adding event 2 [ 1527.615172] ffs_ep0_read()

[ 1538.622555] adding event 3 [ 1538.632397] ffs_ep0_read() [ 1538.677291] dwc2 20980000.usb: new device is high-speed [ 1538.770052] dwc2 20980000.usb: new device is high-speed [ 1538.805970] dwc2 20980000.usb: new address 52 [ 1539.071024] dwc2 20980000.usb: new device is high-speed [ 1539.161989] dwc2 20980000.usb: new device is high-speed [ 1539.205158] dwc2 20980000.usb: new address 13 [ 1539.393236] dwc2 20980000.usb: new device is high-speed [ 1539.483867] dwc2 20980000.usb: new device is high-speed [ 1539.526502] dwc2 20980000.usb: new address 53 [ 1539.701252] configfs-gadget gadget: high-speed config #1: c [ 1539.701306] adding event 2 [ 1539.720429] ffs_ep0_read()

vpelletier commented 6 years ago

About the trace, each adding event means functionfs is queuing an event for userspace program to read, and each ffs_ep0_read is the userspace reading that event - which should be functionfs.Function.__process.

About reads returning an error, I tried unplugging the cable, and I'm getting another kernel error. Yay, more kernel debugging ! This one should be easier than the previous one, though, as it's a timeout of dwc3 reading from the controller, so it should involve fewer parts.

monkey-jsun commented 6 years ago

Hi, Vincent,

The following patch solves the read error problem. You might want to incorporate.

I'm now moving on to missing unplugging event issue. I found a workaround with udevadm, but I'm not very happy about. I think the right thing is to deliver the event to functionfs ep0.

python-functionfs-patch.txt

monkey-jsun commented 6 years ago

Well, I think I got the problem solved. It is pretty hackish mainly because I don't understand what is really going on. But with the attached patch, I'm now able to receive DISABLE event when I unplug the USB cable. Things are working exactly the way I wanted.

Thanks for all your help, Vincent! And I'm sure this is not the last issue I will have. :)

00-usb-disconnect.txt

vpelletier commented 6 years ago

Both look to me like kernel issues. At least, I do not experience the stale file descriptor issue with my dwc3. You should probably submit the dwc2 patch on the kernel usb development mailing list.

I'll try to see if I can get my hands on a rpi zero with wifi, to test my code on more UDCs than just the dwc3 on my edison.

vpelletier commented 6 years ago

Also, I just noticed this mail on the linux-usb mailing list, which looks related to missing disconnection events.

vpelletier commented 6 years ago

Closing old issues.