Closed patryk4815 closed 8 months ago
Hmm, probably may issue is not about missing ret
๐ค . Now I see wait_for_completion_interruptible
should return error ๐ค
My main issue is that: whenwait interrupted
happen my ioctl EPx_read/EPx_write
is frozen for very long time (5-10s) - so maybe wait_for_completion(&done)
taking too much time?
Or we need check returned error from usb_ep_dequeue(ep->ep, ep->req);
๐ค
What if (just wondering ๐ค ):
usb_ep_dequeue
returns errorwait_for_completion
is happening on "completed" taskI wrote this patch and now everything working fine:
No issues after this patch. (except deadlock :trollface: on killing process )
I think better patch would be:
do {
ret = wait_for_completion_interruptible(&done);
if (ret) {
spin_lock_irqsave(&dev->lock, flags);
if (dev->state != STATE_DEV_RUNNING) {
dev_dbg(dev->dev, "wait interrupted, fail, device is not running\n");
goto out_unlock;
}
spin_unlock_irqrestore(&dev->lock, flags);
}
}
while (ret)
btw i asked GPT AI where error ๐
Hi,
There's indeed an issue with returning a proper code when the request was interrupted, we probably need a fix for this.
However, I don't understand why the issue you mention is happening. usb_ep_dequeue
is not checked, as it can fail in case the request has already been dequeued. And then its completion routine must have already been executed.
However, let's check if usb_ep_dequeue
indeed fails.
Could you try running your test with the following patch:
diff --git a/raw_gadget/raw_gadget.c b/raw_gadget/raw_gadget.c
index f203fe2..a294589 100644
--- a/raw_gadget/raw_gadget.c
+++ b/raw_gadget/raw_gadget.c
@@ -968,6 +968,8 @@ static void gadget_ep_complete(struct usb_ep *ep, struct usb_request *req)
spin_unlock_irqrestore(&dev->lock, flags);
complete((struct completion *)req->context);
+
+ dev_dbg(&dev->gadget->dev, "ep = %px, completed\n", r_ep);
}
static int raw_process_ep_io(struct raw_dev *dev, struct usb_raw_ep_io *io,
@@ -1035,10 +1037,14 @@ static int raw_process_ep_io(struct raw_dev *dev, struct usb_raw_ep_io *io,
goto out_done;
}
+ dev_dbg(&dev->gadget->dev, "ep = %px, queued\n", ep);
+
ret = wait_for_completion_interruptible(&done);
if (ret) {
dev_dbg(&dev->gadget->dev, "wait interrupted\n");
- usb_ep_dequeue(ep->ep, ep->req);
+ int deq_ret = usb_ep_dequeue(ep->ep, ep->req);
+ if (deq_ret)
+ dev_dbg(&dev->gadget->dev, "ep = %px, deq_ret = %d\n", ep, deq_ret);
wait_for_completion(&done);
spin_lock_irqsave(&dev->lock, flags);
goto out_done;
and share the dmesg
output?
Thanks!
My diff:
diff --git a/raw_gadget/raw_gadget.c b/raw_gadget/raw_gadget.c
index 5eedee5..de9cb1c 100644
--- a/raw_gadget/raw_gadget.c
+++ b/raw_gadget/raw_gadget.c
@@ -1001,6 +1001,8 @@ static void gadget_ep_complete(struct usb_ep *ep, struct usb_request *req)
spin_unlock_irqrestore(&dev->lock, flags);
complete((struct completion *)req->context);
+
+ dev_dbg(&dev->gadget->dev, "ep = %px, ep_id=%d completed\n", r_ep->ep, r_ep->addr);
}
static int raw_process_ep_io(struct raw_dev *dev, struct usb_raw_ep_io *io,
@@ -1068,11 +1070,16 @@ static int raw_process_ep_io(struct raw_dev *dev, struct usb_raw_ep_io *io,
goto out_done;
}
+ dev_dbg(&dev->gadget->dev, "ep = %px, ep_id=%d, queued\n", ep->ep, ep->addr);
+
ret = wait_for_completion_interruptible(&done);
if (ret) {
- dev_dbg(&dev->gadget->dev, "wait interrupted\n");
- usb_ep_dequeue(ep->ep, ep->req);
+ dev_dbg(&dev->gadget->dev, "wait interrupted ep_id=%d \n", ep->addr);
+ int deq_ret = usb_ep_dequeue(ep->ep, ep->req);
+ if (deq_ret)
+ dev_dbg(&dev->gadget->dev, "ep = %px, ep_id=%d, deq_ret = %d\n", ep->ep, ep->addr, deq_ret);
wait_for_completion(&done);
+ dev_dbg(&dev->gadget->dev, "wait interrupted done ep_id=%d\n", ep->addr);
spin_lock_irqsave(&dev->lock, flags);
goto out_done;
}
Maybe can we remove usb_ep_dequeue
?
From kernel documentation:
* Note that some hardware can't clear out write fifos (to unlink the request
* at the head of the queue) except as part of disconnecting from usb. Such
* restrictions prevent drivers from supporting configuration changes,
* even to configuration zero (a "chapter 9" requirement).
Sometime getting this:
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: wait interrupted ep_id=2
Dec 12 00:36:13 rpi.cypis.ovh kernel: dwc2 fe980000.usb: dwc2_hsotg_ep_stop_xfr: timeout DIEPINT.NAKEFF
Dec 12 00:36:13 rpi.cypis.ovh kernel: dwc2 fe980000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: wait interrupted done ep_id=2
Dec 12 00:36:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
^ this time next "message" blocks very long time and host stop responding.
Some other time:
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: wait interrupted ep_id=2
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, deq_ret = -22
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: wait interrupted done ep_id=2
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2 completed
Dec 12 00:19:13 rpi.cypis.ovh kernel: raw-gadget.0 gadget: ep = ffff1849c1e76880, ep_id=2, queued
^ this time im getting deq_ret = -22
from usb_ep_dequeue
Btw -ERESTARTSYS
is not propagating to user-space don't know why ๐ฆ
When i change it to -EBUSY
it is propagating ๐คจ
From kernel doc:
/*
* These should never be seen by user programs. To return one of ERESTART*
* codes, signal_pending() MUST be set. Note that ptrace can observe these
* at syscall exit tracing, but they will never be left for the debugged user
* process to see.
*/
#define [ERESTARTSYS](https://elixir.bootlin.com/linux/latest/C/ident/ERESTARTSYS) 512
#define [ERESTARTNOINTR](https://elixir.bootlin.com/linux/latest/C/ident/ERESTARTNOINTR) 513
(probably that is fine)
Hmm, this ERESTARTSYS
issue could lead to some weird behavior... If you don't see it being propagated to userspace, that means the kernel handles it internally. And a brief look at the code shows that the kernel is likely trying to restart the syscall by itself.
Could you build the Raw Gadget module from the dev branch and try running your test again? There, the ioctl should return proper values.
Sent the fix [upstream}https://lore.kernel.org/linux-usb/0db45b1d7cc466e3d4d1ab353f61d63c977fbbc5.1698329862.git.andreyknvl@gmail.com/T/#t). Will update the master
branch once the patch is merged.
Note that the device emulation code needs to watch out for the -EINTR
return code and then resend the request.
Hi. After implementing my proxy( https://github.com/patryk4815/usb-proxy ) I tried to debug why "sometime" my code fails with proxying mass storage. It was really "random" (only occurs when transferring really large files), so after hours of debugging my code, it looks like the error is your module :(
ftrace from kernel:
call time:
dmesg:
After some time it randomly get
usb_ep_dequeue
. So i looked intoraw-gadget
code.https://github.com/xairy/raw-gadget/blob/master/raw_gadget/raw_gadget.c#L1044
I think there is missing
ret = -EINTR
. Not sure. But fromUSB_RAW_IOCTL_EP_WRITE
orUSB_RAW_IOCTL_EP_READ
i should get response about "error" right? Now im getting "good" length writted/readed.