virq are lost when send to VM

gravelyy commented 2 months ago

Hello,

It seems, the l4_irq_trigger lost lots of the ire_trigger? Or some wrong usage which I used?

Platfrom: raspberry pi 5 Environment: l4re+vmm+linux 5.14

Issue: send the virq from a l4 demo pkg to linux, but got lots of virq lost. Step:

in pkg demo:

l4_cap_idx_t virq = l4re_env_get_cap("irqcap_rcv");
for (i=0;i<50;i++)
l4_irq_trigger(virq);

in virq.cc

diff --git a/server/src/device/virq.cc b/server/src/device/virq.cc
index f644cd0..3ea9436 100644
--- a/server/src/device/virq.cc
+++ b/server/src/device/virq.cc
@@ -15,6 +15,8 @@
 #include "irq_dt.h"
 #include "mmio_device.h"

+static int count_rcv = 0;
 namespace {

 using namespace Vdev;
@@ -48,7 +50,10 @@ public:
   Irq_rcv(cxx::Ref_ptr<Gic::Ic> const &ic, unsigned irq) : _sink(ic, irq) {}

   void handle_irq()
-  { _sink.inject(); }
+  {
+    count_rcv++;
+    _sink.inject();
+  }

in linux

+static int mycount_rcv=0;
+
+static irqreturn_t virq_handler(int irq, void *dev_id)
+{
+       linux_count_rcv++;
+       return IRQ_HANDLED;
+}

Results: 1,l4_irq_trigger send 50 times virq. 2,Irq_rcv handle_irq only receive 38(count_rcv is 38) times call. （which I wish to get 50 times） 3,linux virq_handler only receive 38(linux_count_rcv is 38) times call.

Thanks.

phipse commented 2 months ago

Hi gravelyy, thanks for the report. I see this is your first issue on github. Please have a look at formatting your report. In this state it is very hard to read and hinders understanding of the actual issue.

Regarding the issue: It's good to see that the number of IRQs uvmm receives and forwards to linux and the number of observed IRQs in linux are the same. Nothing is lost there. So the question is: Why is the number of IRQs uvmm receives lower than the number of l4_irq_trigger() invocations in the demo application?

l4_irq_trigger() executes a call to a kernel IRQ object, which then acts as relay to send to the uvmm. The IRQ object itself does not count how many times it was triggered before it was able to send the IRQ along to the uvmm. So the demo application can call IRQ trigger many times, before the uvmm is able to receive it, e.g. due to scheduling. If this is the case, the behavior is as expected.

To validate this please check the following:

Are uvmm and the demo application running on the same core? Do the threads have the same priority?
Add a sleep() between the l4_irq_trigger() calls. Does this increase the number of visible IRQs at uvmm?

gravelyy commented 2 months ago

Hi phipse，thanks for your replay, and so sorry for the report format.

I did the following tests: 1, add usleep(1*1000) between the l4_irq_trigger() calls, 5000 times lost 1178. 2, add usleep(50*1000) between the l4_irq_trigger() calls, 5000 times lost 25. 3, add sleep(1) between the l4_irq_trigger() calls, 1000 times lost 17.

And in my demo I think the uvmm and the demo application should not run on the same core and should with the same priority. Because I don't change any priority or core binding setting.

But if the behavior is as expected. How can I make sure all the l4_irq_trigger can send to linux? I see the l4_irq_trigger() call the kernel IRQ object with svc, It's hard for me to understand how a call from svc could be lost?

gravelyy commented 2 months ago

By the way, if the behavior is as expected. It seems we should have a retry mechanism in "virq.cc" to make sure the virq from VM can be send to any other application?

@@ -114,14 +114,20 @@ class Irq_snd : public Device, public Vmm::Mmio_device_t<Irq_snd>
 public:
   explicit Irq_snd(L4::Cap<L4::Irq> irq) : _irq(irq) {}

   void write(unsigned /*reg*/, char /*size*/, l4_uint64_t /*value*/, unsigned)
   {
     /* address does no matter */
-    _irq->trigger();
+    l4_msgtag_t tag;
+    tag = _irq->trigger();
+    while(tag.has_error()){
+       usleep(1000);
+       tag = _irq->trigger();
+    } ;
   }

Or is there any API that can "ensure delivery"?

admlck commented 2 months ago

We do not believe any notification is lost. The Irq is not counting, i.e. the sending side can trigger many times with the receiving side seeing at least one of those if it just happens to able to fetch them after all those triggers happened. However, no notification shall be lost.

gravelyy commented 2 months ago

I also do not believe any notification is lost. But At least it seems that way in my case. add sleep(1) between the l4_irq_trigger() calls, 1000 times lost 17. only got 983 in linux kernel.

pkg try to virq:1000
virq got:983
virq try to linux:983
linux got:983

So it seems to me, a reasonable explanation is that the triggered data path is asynchronous, and there are some triggers that are merged on this data path？

But if this explanation, why those code work well?

    l4_msgtag_t tag;
    tag = _irq->trigger();
    while(tag.has_error()){
       usleep(1000);
       tag = _irq->trigger();
    } ;

admlck commented 2 months ago

Yes, IRQs are asynchronous.

For reasons of scheduling the following might be executed:

   trigger thread         |   receiver thread / vcpu
  -------------------------------------------
       irq.trigger();     |
       irq.trigger():     |
                          | receive()

So there are two triggers but because the receiver did not run it will only see that there is a notification but not how many times trigger was called (IRQs are not counting).

With the sleep it typically looks like this:

    trigger thread          |   receiver thread / vcpu
  -------------------------------------------
     irq.trigger();         |
     sleep(xx);             |
                            | receive()
     returning sleep(xx);   |
     irq.trigger():         |
                              | receive()

The sleep will trigger scheduling / a context switch such that the receiver runs.

Generally, notifications are used together with some memory state. So a sender puts something in the (shared) memory and triggers an IRQ. It can do so gain. When the receiver gets the notification it just knows there's something in the shared memory and needs to check it. It will handle all new data (like in a queue) until all new data is processed.

huber12paul commented 1 month ago

Hi,

how did you (@gravelyy) get the PI5 to work? I had no luck in trying to boot the PI5 and in the official fiasco repository I can not find a BSP dedicated to the PI5. Would you mind sharing your work for the PI5 ?

Thank you!

jicky1984 commented 3 weeks ago

Hi admlck，

It seems I have a similar issue. I'm currently working on implementing audio virtualization for a VM running Linux using VIRQ. However, I'm encountering a significant issue with IRQ handling latency. When the host triggers an IRQ, the VIRQ handler in Linux experiences a delay of nearly 50ms before it receives the interrupt. This level of latency is problematic for audio applications. Could you provide any suggestions or solutions to reduce this latency? Any insights or guidance on how to address this issue would be greatly appreciated.

Thanks.

L4Re commented 2 weeks ago

Hi jicky1984,

thanks for reaching out. Could you tell a little bit more the setup, for example, how are the sender and receivers are distributed among the cores and what else might be running on the cores? I need to better understand the setup.

Thanks.

gravelyy commented 2 weeks ago

Hi @huber12paul ,

What specific issue are you encountering when trying to boot L4Re on your PI5? I have it running on my board without the major issues, although there are still some problems to solve. At least it boots and runs some interesting things.

Do you have any boot logs or have you tried using a debugger?

admlck commented 1 week ago

Hi, please note that support for rpi5 has been added now.

kernkonzept / uvmm

virq are lost when send to VM #3