openhwgroup / cv32e40p

CV32E40P is an in-order 4-stage RISC-V RV32IMFCXpulp CPU based on RI5CY from PULP-Platform
https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest
Other
941 stars 412 forks source link

HW Loop Constraints question #937

Closed jstraus59 closed 8 months ago

jstraus59 commented 8 months ago

I have a question regarding the hardware loop constraints.

Suppose that the cv.count 0, %[N]; instruction at line 12 of the assembly code example in the documentation were to be deleted. Assuming that the lpcount0 register is initially 0, it would seem this would simply result in a single execution of the inner loop on the first execution of the outer loop, and then the expected number of inner loop executions on each subsequent outer loop iteration.

AFAICT this would not violate any of the HW Loop constraints - can you please confirm that this would not be a violation of any constraints, and should work as expected?

However, if the RTL takes special action for any instruction within the hardware loop body then it is possible there could be some unexpected behavior if, for example, an interrupt or exception occurs during the initial iteration of the inner loop, while lpcount0==0 (in normal loop execution the lpcount would never be expected to be 0.)

If this case IS a problem, then additional constraints are needed, perhaps a constraint that an instruction in a loop body may not be executed when the corresponding lpcount is 0?

One more potential problem I see along these lines is when using the non-atomic instructions to update the lpstart/lpend registers. Once one is written, it is then possible that the next instruction to be executed is within the hardware loop body, as defined by the intermediate values of the start and end. If the hardware has comparators to detect that an instruction being executed is between the start and end then whatever behavior is triggered as a result of that determination could be erroneously triggered.

Is this a potential problem?

pascalgouedo commented 8 months ago

Suppose that the cv.count 0, %[N]; instruction at line 12 of the assembly code example in the documentation were to be deleted. Assuming that the lpcount0 register is initially 0, it would seem this would simply result in a single execution of the inner loop on the first execution of the outer loop, and then the expected number of inner loop executions on each subsequent outer loop iteration.

Agree.

AFAICT this would not violate any of the HW Loop constraints - can you please confirm that this would not be a violation of any constraints, and should work as expected?

Agree. No constraint violation, just bad HWLoop setup. In fact it is treated just as sequential code and not an hwloop body when loop count == 0.

However, if the RTL takes special action for any instruction within the hardware loop body then it is possible there could be some unexpected behavior if, for example, an interrupt or exception occurs during the initial iteration of the inner loop, while lpcount0==0 (in normal loop execution the lpcount would never be expected to be 0.)

RTL execution is correct even when an interrupt happens during this first and unique loop iteration when lpcount0 == 0 as it treats it as normal sequential code.

If this case IS a problem, then additional constraints are needed, perhaps a constraint that an instruction in a loop body may not be executed when the corresponding lpcount is 0?

This case does not cause any problem so no new constraint is needed here.

One more potential problem I see along these lines is when using the non-atomic instructions to update the lpstart/lpend registers. Once one is written, it is then possible that the next instruction to be executed is within the hardware loop body, as defined by the intermediate values of the start and end. If the hardware has comparators to detect that an instruction being executed is between the start and end then whatever behavior is triggered as a result of that determination could be erroneously triggered.

Is this a potential problem?

Yes this could be problem if an hwloop programming sequence like that would happen. But if loop count is programmed last and it was 0 before, those instructions in between start/stop programming are treated as normal/sequential instructions. For sure if loop count is programmed first (and > 0) then those kind of incorrect behavior could happen. That's why I added in the User manual this paragraph in PR #932. I should maybe change the sentence by saying "strongly suggested" rather than "good practice". Or "lpstartX and lpendX must be programmed before lpcountX to avoid unexpected behavior".

jstraus59 commented 8 months ago

@pascalgouedo Thanks for your response. I would definitely recommend the last: lpstartX and lpendX must be programmed before lpcountX to avoid unexpected behavior, but I don't see how that is different than adding an additional constrant.

FYI, after implementing the HW Loop constraint checking in the Imperas ISS, I have found that the assembly code example of HW loop code in earlier versions of the manual throw a constraint violation.

Specifically, in the initialization of the outer loop:

...
4      "cv.count  1, %[N];"
5      ".balign 4;"
6      "cv.endi   1, endO;"
7      "cv.starti 1, startO;"
...

when the cv.starti instruction at line 7 executes it is between lpstart0 (which is 0) and lpend0 (which is a few instructions after this one) and thus is considered part of HW Loop body, and thus when it executes while lpcount is non-zero this violates the constraint on only executing instructions in a HW Loop body by entering the loop from its start location.

As you have pointed out, this example has been fixed in later versions of the document.

Please advise if you do not think generating a constraint violation in this case is appropriate. Technically, the constraint specifically mentions no branch/jump to a location inside a HWLoop body, but this seems to still meet the criteria of an apparent loop body instruction executing without entering the loop body from the start location.

jstraus59 commented 8 months ago

@pascalgouedo One more point that has come up in my testing. In your previous response you stated "...But if loop count is programmed last and it was 0 before...".

But the example code in the V1.5.0 document leaves a non-zero value in lpcount0 when it ends, as the inner loop counter is set at the end of the outer loop:

...
    "    endZ:;"
    "    cv.count 0, %[N];"
    "    addi %[j], %[j], 2;"
    "endO:;"

So, if users follow this template, they cannot assume the lpcountX values are always 0 when not in a loop.

A simple constraint that lpstartX/lpendX may not be changed when lpcountX is non-zero would solve all these problems, and in fact could be enforced in hardware with an illegal instruction exception. This would also greatly simplify constraint checking, as it would only need to be done upon lpcountX transitioning from 0 to non-zero.

pascalgouedo commented 8 months ago

I would definitely recommend the last: lpstartX and lpendX must be programmed before lpcountX to avoid unexpected behavior, but I don't see how that is different than adding an additional constraint.

I changed the sentence and I prefer not to add another constraint as there are already too much of them in my opinion.

As you have pointed out, this example has been fixed in later versions of the document.

I changed start/stop order.

Please advise if you do not think generating a constraint violation in this case is appropriate. Technically, the constraint specifically mentions no branch/jump to a location inside a HWLoop body, but this seems to still meet the criteria of an apparent loop body instruction executing without entering the loop body from the start location.

I think that this case shouldn't generate a constraint violation and keeping no branch/jump to a location inside a HWLoop body constraint only for instructions really in the hwloop body.

But the example code in the V1.5.0 document leaves a non-zero value in lpcount0 when it ends, as the inner loop counter is set at the end of the outer loop:

You are right the example is wrong. I didn't figure it out when I changed it. I corrected it to have the correct behavior.

A simple constraint that lpstartX/lpendX may not be changed when lpcountX is non-zero would solve all these problems, and in fact could be enforced in hardware with an illegal instruction exception. This would also greatly simplify constraint checking, as it would only need to be done upon lpcountX transitioning from 0 to non-zero.

This constraint has been added. As stated in the paragraph after the constraints list none of them are enforced by HW.

jstraus59 commented 8 months ago

Looks good! Thanks for the quick response.