[irq] Check whether ISRs need to take into account signal latencies

msfschaffner commented 2 years ago

OpenTitan standardizes on level encoded interrupts, meaning that an interrupt is considered active as long as the level is asserted high. Also, the PLIC treats all interrupt signals as asynchronous, which means that there is a minimum of 3 PLIC clock cycles of delay until a change of the interrupt signal level is recognized at the PLIC side.

Hence, there may exist scenarios where an ISR may be called a second time after it has executed and cleared both the interrupt status bits at the peripheral side and at the PLIC side. This would only happen if the latency of the interrupt signal between the peripheral and the PLIC is larger than the time it takes between the store operation clearing the peripheral bit and the store operation celaring the PLIC bit.

This may not be an issue at the moment, since afaik we do not enable the instruction cache on all test programs yet. But once we do that and instructions are executed at-speed, this may lead to "false IRQ positives" in some tests.

There are ways to address this on the SW side: e.g. insert a long enough delay between the clearing operations, or make the ISR more permissive so that it can handle false positives when called a second time.

We should keep this issue open until we are confident that either all ISRs are able to handle this correctly, or we can prove that the issue cannot occur in our system based on an execution time analysis (i.e., best case SW execution delay vs worst case HW signal delay).

msfschaffner commented 2 years ago

CC @tjaychen @alphan @timothytrippel @moidx @cfrantz @arunthomas

alphan commented 2 years ago

What is the frequency of the PLIC clock in relation to the CPU clock? Also, what would the CPU read from PLIC registers immediately after clearing them? If 0, then this could be a guard condition at the start of ISRs.

msfschaffner commented 2 years ago

It is running on the same clock as the CPU. If you claim & complete an interrupt in the PLIC but the incoming interrupt signal is still asserted, then you would still read a non-zero interrupt ID in the PLIC CC0 register in the next cycle.

lowRISC / opentitan

[irq] Check whether ISRs need to take into account signal latencies #14908