Closed stffrdhrn closed 5 years ago
@stffrdhrn
Yes, you are right. And the behavior is similar to one for processing of TT/PIC interrupts. All of them don't follow arch. manual. We discussed TT/PIC interrupts processing here: https://github.com/openrisc/mor1kx/issues/74#issuecomment-462209653
Actually the FP-behavior also doesn't meet IEEE-standard completely. IEEE-standard expects that FP result should be "returned" (written into RF for our case) even so FP instruction generates exception. Moreover there are two approaches how to compute the returned value for overflow/underflow cases.
For simpler case overflow should return signed infinity while underflow should return signed zero.
For complex approach FPU should return tricky re-scaled value.
Of course my implementation follows simpler variant (it computes +/- inf for overflow and +/- 0 for underflow). But if FP related exceptions are enabled, FPU doesn’t “return” value because any exception discards exactly the instruction caused exception and write back is blocked. On the one hand the approach helps to simplify design as any instruction’s exception and any TT/PIC interrupt processed in the same manner. On the other hand it doesn’t follow arch. manual and leads to side effects. In particular I had to modify TestFloat in the following manners:
I have to run each TestFloat’s verification case twice: one time to collect flags with enabled FP exceptions and another one to get FP result with disabled FP exceptions
TestFloat expects tricky re-scaled values for overflow/underflow cases. I had to re-define such behavior. Unfortunately, I didn’t mark such cases in TestFloat sources accurate enough. At least I see there is defined BLOCK_UNDERFLOW_FLAG in softfloat.h that blocks some of such cases.
Additionally, it looks like there is difference TestFloat from IEEE-standard in places how to generate NaNs. And there is workaround with global variable checkNaNs = FALSE (in testLoops.c).
Fairly speaking, I’m not sure what would be better. To implement behavior as required by arh. manual and IEEE-standard. Or to leave it as is and re-write at least arh. manual. The 1-st way looks more accurate but it is not so trivial because brings to pipe design exclusive approach for handling FP exceptions.
I wanted to write a longer reply on this but didn't get the time.
I agree the spec can be updated. But I read over it. Is there anything saying that upon fp exception that the results should NOT be committed?
I think what you are saying is that if we allow the fp to commit it's results before jumping to the exception handler this will follow ieee spec and follow proper update of epcr. I don't see that as a bug change to the openrisc spec, more just a clarification.
On Wed, Mar 20, 2019, 12:09 AM Andrey Bacherov notifications@github.com wrote:
@stffrdhrn https://github.com/stffrdhrn
Yes, you are right. And the behavior is similar to one for processing of TT/PIC interrupts. All of them don't follow arch. manual. We discussed TT/PIC interrupts processing here: openrisc/mor1kx#74 (comment) https://github.com/openrisc/mor1kx/issues/74#issuecomment-462209653
Actually the FP-behavior also doesn't meet IEEE-standard completely. IEEE-standard expects that FP result should be "returned" (written into RF for our case) even so FP instruction generates exception. Moreover there are two approaches how to compute the returned value for overflow/underflow cases.
-
For simpler case overflow should return signed infinity while underflow should return signed zero.
For complex approach FPU should return tricky re-scaled value.
Of course my implementation follows simpler variant (it computes +/- inf for overflow and +/- 0 for underflow). But if FP related exceptions are enabled, FPU doesn’t “return” value because any exception discards exactly the instruction caused exception and write back is blocked. On the one hand the approach helps to simplify design as any instruction’s exception and any TT/PIC interrupt processed in the same manner. On the other hand it doesn’t follow arch. manual and leads to side effects. In particular I had to modify TestFloat in the following manners:
-
I have to run each TestFloat’s verification case twice: one time to collect flags with enabled FP exceptions and another one to get FP result with disabled FP exceptions
TestFloat expects tricky re-scaled values for overflow/underflow cases. I had to re-define such behavior. Unfortunately, I didn’t mark such cases in TestFloat sources accurate enough. At least I see there is defined BLOCK_UNDERFLOW_FLAG in softfloat.h that blocks some of such cases.
Additionally, it looks like there is difference TestFloat from IEEE-standard in places how to generate NaNs. And there is workaround with global variable checkNaNs = FALSE (in testLoops.c).
Fairly speaking, I’m not sure what would be better. To implement behavior as required by arh. manual and IEEE-standard. Or to leave it as is and re-write at least arh. manual. The 1-st way looks more accurate but it is not so trivial because brings to pipe design exclusive approach for handling FP exceptions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openrisc/or1k_marocchino/issues/5#issuecomment-474414315, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSvn3asYP8cV2z3gXlLvVPJYIm6mEp7ks5vYP3HgaJpZM4b6w1n .
@stffrdhrn
Stafford, I’m going to try to make FP exceptions processing be closer to arh. manual and IEEE-standard. I hope it will take several days including TestFloat upgrade. I’ll create fp_exceptions
branch as it will be done. Are you still focused on writing FP test cases? Perhaps, it would be better to postpone them for a while. Is it OK for you?
This is ok with me. I was mainly trying to analyze some simple FP tests written in C, to see if I can reproduce any of the issues you were seeing with testfloat in order to see where the issues where. I haven't made much progress. I don't expect the EPCR issue will affect what I am working on.
@stffrdhrn
Stafford, I’ve created fp_exceptions
branch with modified behavior of FP exceptions. Now they should meet arch. manual and be closer to IEEE-standard. I verified it with appropriately modified version of TestFloat (now you can find it in https://github.com/bandvig/or1k_testfloat/tree/fp_exceptions) . If you would want to use it, pay attention to notes in Readme.md. I also successfully run other soft from my OR1K verification pool (U-boot, Whetstone compiled for Single and Double precision, CoreMark, Dhrystone, Linux). Source code also passed Verilator’s check for lint warnings.
On Sun, Mar 24, 2019 at 06:41:46AM -0700, Andrey Bacherov wrote:
@stffrdhrn Stafford, I’ve created
fp_exceptions
branch with modified behavior of FP exceptions. Now they should meet arch. manual and be closer to IEEE-standard. I verified it with appropriately modified version of TestFloat (now you can find it in https://github.com/bandvig/or1k_testfloat/tree/fp_exceptions) . If you would want to use it, pay attention to notes in Readme.md. I also successfully run other soft from my OR1K verification pool (U-boot, Whetstone compiled for Single and Double precision, CoreMark, Dhrystone, Linux). Source code also passed Verilator’s check for lint warnings.
OK, let me have a look.
Is there any change to the OpenRISC spec needed? Is it just as I suggested before? or is it a bigger change?
-Stafford
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/openrisc/or1k_marocchino/issues/5#issuecomment-475960539
@stffrdhrn
OK, let me have a look. Is there any change to the OpenRISC spec needed? Is it just as I suggested before? or is it a bigger change?
Not sure that I understand all of your questions clearly. Let me just give more details.
address of next not executed instruction
is stored in EPCR. At the same time even so an FP instruction causes exception its result is stored in GPRs (that corresponds to IEEE-standard if I understand the standard correctly). Your tests shouldn't hangs up anymore. No need any change for OpenRISC spec because FP exceptions processing meets OpenRISC spec now.About TestFloat tricks I made and described above:
Thanks to that TestFloat needn't run each instruction twice anymore. That eliminates first trick modification I made to run TestFloat successfully. The https://github.com/bandvig/or1k_testfloat/tree/fp_exceptions TestFloat version runs each FP-instruction one time to collect result an flags while FP exception handler is used just to check that FP exceptions could be processed some how. Just a counter implemented in FP exception handler. The counter value is printed at exit of TestFloat.
It doesn't fix issues related to generating NaNs or underflow flags. These SoftFloat-vs-FPU differences still exist. However I believe the issues fixing could be postponed while more important task is to complete support doubles in GCC9. And don't forget to set MAROCCHINO's parameter OPTION_ORFPX64A32_ABI to "GCC9".
Potential bonus-track:
Hello,
This does answer my question. I think we need an update to the openrisc spec clarifying that results of fpu operations that cause expeptions sould be written back before the expeption is taken.
Also, I made one pull request to fix an issue with the rX + 2 logic. It's only for the higher registers. Also, for optimisation purposes we could do like is one in gcc and do register renaming instead of adjusting the store address.
For example.
r0-r15 store to index 0-15 r16-r30 even store to index 16-23 r17-r31 odd store to index 24-31.
16 18 20 22 24 26 28 30 16 17 18 19 20 21 22 23
17 19 21 23 25 27 29 31 24 25 26 27 28 29 30 31
I think, We just need to change the fetch rf decode stage (where I made the patch) to rename from instruction address to internal rf address. Then you can use the previous ram model.
Also, you mentioned you were seeing failures with the gcc9 toolchain. Can you clarify how to reproduce and what your saw? I have been testing with a few custom built c programs and assembly and things are looking ok. I must not be hitting certain edge cases.
On Mon, Mar 25, 2019, 5:21 PM Andrey Bacherov notifications@github.com wrote:
@stffrdhrn https://github.com/stffrdhrn
OK, let me have a look. Is there any change to the OpenRISC spec needed? Is it just as I suggested before? or is it a bigger change?
Not sure that I understand all of your questions clearly. Let me just give more details.
- The fix addresses the issue you reported initially. Now if an FP exception occurs address of next not executed instruction is stored in EPCR. At the same time even so an FP instruction causes exception its result is stored in GPRs (that corresponds to IEEE-standard if I understand the standard correctly). Your tests shouldn't hangs up anymore. No need any change for OpenRISC spec because FP exceptions processing meets OpenRISC spec now.
About TestFloat tricks I made and described above:
-
Thanks to that TestFloat needn't run each instruction twice anymore. That eliminates first trick modification I made to run TestFloat successfully. The https://github.com/bandvig/or1k_testfloat/tree/fp_exceptions TestFloat version runs each FP-instruction one time to collect result an flags while FP exception handler is used just to check that FP exceptions could be processed some how. Just a counter implemented in FP exception handler. The counter value is printed at exit of TestFloat.
It doesn't fix issues related to generating NaNs or underflow flags. These SoftFloat-vs-FPU differences still exist. However I believe the issues fixing could be postponed while more important task is to complete support doubles in GCC9. And don't forget to set MAROCCHINO's parameter OPTION_ORFPX64A32_ABI to "GCC9".
Potential bonus-track:
- Current implementation of TT/PIC-interrupts processing doesn't follow OpenRISC spec but follow CAPPUCINO's approach. Thanks to modifications made for FP-exceptions we potentially could make next step and implement TT/PIC-interrupts processing in according with OpenRISC spec.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openrisc/or1k_marocchino/issues/5#issuecomment-476096505, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSvnxE5AdYxhozLqlGx0o4YJtIincoHks5vaIbsgaJpZM4b6w1n .
Hello, This does answer my question. I think we need an update to the openrisc spec clarifying that results of fpu operations that cause expeptions sould be written back before the expeption is taken.
Ah, understood. Also ORFPX64A32 description should be added. However, I couldn’t imagine yet how to correctly describe addresses of operands and destination with taking into account “remapping” for GPRs 16 … 31.
Also, you mentioned you were seeing failures with the gcc9 toolchain. Can you clarify how to reproduce and what your saw? I have been testing with a few custom built c programs and assembly and things are looking ok. I must not be hitting certain edge cases.
Yes, Double precision version of Whetstone hanged up if being compiled by GCC9 with -O2 option. But it was before your pull request #6 fpu: Properly support the +2 reg logic
. With the modifications all work, so I merged your PR. However, I see that CI reports timeouts in or1k-tests
again. It’s strange because first commit in fp_exceptions
branch passed all tests and your modifications should not affects tests absolutely.
Let me see if I can make a patch for remapping.
It's good news that the whetstone issues are fixed. It means we are almost done. I'll look into the timeouts. None of the tests should use the reg +2 logic so it's strange.
The test is passing now. The Icarus source code failed to download. It's fixed now.
Let me see if I can make a patch for remapping.
Wait a minute, Stafford. Which patch you are going to make? I meant that you PR #6 fixes issue with GCC9 ABI support in hardware, so I'm going to merge it into master branch.
The only thing I'm not sure is how it could be described in OpenRISC spec correctly. Something like this (for lf.add.d): ORFPX64A32-Implementation: {(rD+1+(rD>15))[31:0],rD[31:0]} <- {(rA+1+(rA>15))[31:0],rA[31:0]} + {(rB+1+(rB>15))[31:0],rB[31:0]} ? It looks really odd.
Or are you going to try to change GGC9 internals in a way which could restore back ability to use just {rX+2} in hardware?
Let me see if I can make a patch for remapping.
Wait a minute, Stafford. Which patch you are going to make? I meant that you PR #6 https://github.com/openrisc/or1k_marocchino/pull/6 fixes issue with GCC9 ABI support in hardware, so I'm going to merge it into master branch.
Yes, I agree what is there now is good and I have no plan to change the gcc9 abi.
But before you mentioned
Second. Initially, I implemented GPRs as set of four RAM-blocks to provide
{rX,rX+1} access to 64-bit operands for ORFPX64A32 extension. However, to
meet
GCC9 ABI ({rX,rX+2} access for 64-bit operands) I had to replace RAM-blocks
by
set of Flip-Flop based register. MAROCCHINO had become much larger and only
75
MHz CPU clock could be achieved now. The included 100 MHz Coremark results
were
measured on previous MAROCCHINO version (with RAM-based GPRs).
I was thinking, using a register renaming technique we can support both the gcc9 abi and still use a 4ram block solution running at 100mhz.
I just wanted to make a poc patch to show how it can be achieved. An extra improvement on the current core.
The only thing I'm not sure is how it could be described in OpenRISC spec
correctly. Something like this (for lf.add.d): ORFPX64A32-Implementation: {(rD + 1 +(rD > 15))[31:0],rD[31:0]} <- {(rA + 1 +(rA > 15))[31:0],rA[31:0]} + {(rB + 1 +(rB > 15))[31:0],rD[31:0]} ? It looks really odd.
I think we will need to show some examples. And split the above into multiple lines. Also we should explain the "why".
Or are you going to try to change GGC9 internals in a way which could restore back ability to use just {rX+2} in hardware?
No, plans.
But before you mentioned
Second. Initially, I implemented GPRs as set of four RAM-blocks to provide {rX,rX+1} access to 64-bit operands for ORFPX64A32 extension. However, to meet GCC9 ABI ({rX,rX+2} access for 64-bit operands) I had to replace RAM-blocks by set of Flip-Flop based register. MAROCCHINO had become much larger and only 75 MHz CPU clock could be achieved now. The included 100 MHz Coremark results were measured on previous MAROCCHINO version (with RAM-based GPRs).
Oh! Stafford! 4-RAM blocks RF design was restored many days ago :). And 100 MHz is achievable again. In fact we started or1k_marocchino
repo with 4-RAM blocks RF design. You just missed this.
https://github.com/openrisc/or1k_marocchino/blob/93a5a34c41ccc8abec138421f4fd3e70077159dc/rtl/verilog/or1k_marocchino_rf.v#L531 https://github.com/openrisc/or1k_marocchino/blob/93a5a34c41ccc8abec138421f4fd3e70077159dc/rtl/verilog/or1k_marocchino_rf.v#L581 https://github.com/openrisc/or1k_marocchino/blob/93a5a34c41ccc8abec138421f4fd3e70077159dc/rtl/verilog/or1k_marocchino_rf.v#L631 https://github.com/openrisc/or1k_marocchino/blob/93a5a34c41ccc8abec138421f4fd3e70077159dc/rtl/verilog/or1k_marocchino_rf.v#L681
It was done by cost of one extra cpu_clock
for write-back D2 (most significant word of double result).
I was thinking, using a register renaming technique we can support both the gcc9 abi and still use a 4ram block solution running at 100mhz. I just wanted to make a poc patch to show how it can be achieved. An extra improvement on the current core.
If you want to propose an alternative solution, you are welcome of course.
According to the spec
6.3 Exception Processing
. For FPU exceptions the EPCR should be set toAddress of next not executed instruction
.This is the same as
Tick Timer
,External Interrupt
andSyscall
.It seems currently FPU exceptions are being set to
Address of instruction that caused exception
. This is causing tests to fail.Test: https://github.com/openrisc/binutils-gdb/blob/orfpx64a32/sim/testsuite/sim/or1k/fpu64a32.S