openrisc / mor1kx

mor1kx - an OpenRISC 1000 processor IP core
Other
484 stars 146 forks source link

Status of FEATURE_FASTCONTEXTS in cappuccino #158

Open zeldin opened 2 months ago

zeldin commented 2 months ago

The cappuccino CPU has a parameter FEATURE_FASTCONTEXTS which is said to "Enable fast context switching of register sets". Setting this to ENABLED makes SR[CE] writable, however SR[CE] does not seem to actually do anything. SR[CID] is not writable, and the top bits of wb_rfd_adr_expand, rfa_rdad, and rfb_rdad are all hard-coded to 0.

Is FEATURE_FASTCONTEXT expected to be useful for anything now? Is there work planned in this area? Would you accept a PR which made SR[CID] writable, SR[CE] increment it on exception, and used its bits to select the current GPR set?

stffrdhrn commented 1 month ago

Hi @zeldin sorry I missed your message. I really struggle to get notifications to me from gihub. I will try to figure this out.

We don't use fast context switches in any OS's right now, and traditionally context is saved to the stack to allow for a more general context switching mechanism. But I do think fast context switching would be an interesting experiment and implementation would be appreciated. I have not even looked into it as per what I stated before. The issue with fast context switching is we will always have an upper bound of how many concurrent contexts we can support. This may be good for a small RTOS but might be hard to implement with a general OS like Linux.

Would you be interested in writing some test software too to help show the benefits vs storing context to the stack? What are you thinking as a use case?

zeldin commented 1 month ago

My use case is not an OS, not even an RTOS, but bare metal embedded. I don't want multiple processes or anything, just efficient (both in space, as I want to be able to run from BRAM only, and in execution time) processing of interrupts and exceptions. Currently I use crt0.o from newlib as vector entrypoint, but it would be nice be able to switch to something leaner. Having less dead space between the entrypoints would also be nice, but alas that is defined by the architecture. :disappointed:

What environmental requirements would you have on such test software? Is something which runs on a Nexys A7 board and outputs something on the debug UART ok? Or runnable in some Icarus setup?

stffrdhrn commented 1 month ago

If the change is in the mor1kx core any platform you use would be fine as long as the software can be built in newlib. I use either Litex or mor1kx-generic to run software either via simulations or FPGA.

zeldin commented 1 month ago

Well, it would have to be built with -nostartfiles since replacing newlib's crt0.o with something more efficient is key here. :smile: I'll see if I can whip something up, but probably not this week.

stffrdhrn commented 1 month ago

That should be fine, the one thing that is needed from newlib are the board libraries to handle reading/writing the UART.

< shorne@antec ~/work/openrisc/embench-tester > ls ~/local/or1k-elf/or1k-elf/lib/ -l | grep or1k
-rw-r--r--. 1 shorne shorne     1258 Mar 21  2022 libboard-or1ksim.a
-rw-r--r--. 1 shorne shorne     1264 Mar 21  2022 libboard-or1ksim-uart.a
-rw-r--r--. 1 shorne shorne    84070 Mar 21  2022 libor1k.a

Though, this does remind me that we have some re-entrant code somewhere in newlib that uses shadow registers to temporarily store context. I could not find it on first glance. This might creep up later.

Good luck.

zeldin commented 1 month ago

@stffrdhrn So, I wanted to try using your mor1kx-generic, but it failed already on the first instruction of your test asm program, without me making any local changes at all:

vvp -n -M. -l icarus.log -melf_loader_vpi -mjtag_vpi  mor1kx-generic_1.1 -fst +elf_load=/tmp/openrisc/src/openrisc-asm +trace_enable=1 +trace_to_screen=1 +vcd=1
FST info: dumpfile testlog.vcd opened for output.
Program header 0: addr 0x00000000, size 0x000001A0
elf-loader: /tmp/openrisc/src/openrisc-asm was loaded
Loading         104 words
                   0 : Illegal Wishbone B3 cycle type (xxx)
S 00000100: 18800000 l.movhi r4,0x0000       r4     = 00000000  flag: 0
S 00000104: a8840110 l.ori   r4,r4,0x0110    r4     = 00000110  flag: 0
S 00000108: 44002000 l.jr    r4                             flag: 0
S 0000010c: 15000000 l.nop   0x0000                         flag: 0
S 00000110: 18000000 l.movhi r0,0x0000       r0     = 00000000  flag: 0
S 00000114: 9c200001 l.addi  r1,r0,0x0001    r1     = 00000001  flag: 0
S 00000118: 9c410002 l.addi  r2,r1,0x0002    r2     = 00000003  flag: 0
S 0000011c: 9c620004 l.addi  r3,r2,0x0004    r3     = 00000007  flag: 0
S 00000120: 9c830008 l.addi  r4,r3,0x0008    r4     = 0000000f  flag: 0
S 00000124: 9ca40010 l.addi  r5,r4,0x0010    r5     = 0000001f  flag: 0
S 00000128: 9cc50020 l.addi  r6,r5,0x0020    r6     = 0000003f  flag: 0
S 0000012c: 9ce60040 l.addi  r7,r6,0x0040    r7     = 0000007f  flag: 0
S 00000130: 9d070080 l.addi  r8,r7,0x0080    r8     = 000000ff  flag: 0
S 00000134: 9d280100 l.addi  r9,r8,0x0100    r9     = 000001ff  flag: 0
S 00000138: 9d490200 l.addi  r10,r9,0x0200   r10    = 000003ff  flag: 0
S 0000013c: 9d6a0400 l.addi  r11,r10,0x0400  r11    = 000007ff  flag: 0
S 00000140: 9d8b0800 l.addi  r12,r11,0x0800  r12    = 00000fff  flag: 0
S 00000144: 9dac1000 l.addi  r13,r12,0x1000  r13    = 00001fff  flag: 0
S 00000148: 9dcd2000 l.addi  r14,r13,0x2000  r14    = 00003fff  flag: 0
S 0000014c: 9dee4000 l.addi  r15,r14,0x4000  r15    = 00007fff  flag: 0
S 00000150: 9e0f8000 l.addi  r16,r15,0x8000  r16    = ffffffff  flag: 0
S 00000154: e3e00802 l.sub   r31,r0,r1       r31    = ffffffff  flag: 0
S 00000158: e3df1002 l.sub   r30,r31,r2      r30    = fffffffc  flag: 0
S 0000015c: e3be1802 l.sub   r29,r30,r3      r29    = fffffff5  flag: 0
S 00000160: e39d2002 l.sub   r28,r29,r4      r28    = ffffffe6  flag: 0
S 00000164: e37c2802 l.sub   r27,r28,r5      r27    = ffffffc7  flag: 0
S 00000168: e35b3002 l.sub   r26,r27,r6      r26    = ffffff88  flag: 0
S 0000016c: e33a3802 l.sub   r25,r26,r7      r25    = ffffff09  flag: 0
S 00000170: e3194002 l.sub   r24,r25,r8      r24    = fffffe0a  flag: 0
S 00000174: e2f84802 l.sub   r23,r24,r9      r23    = fffffc0b  flag: 0
S 00000178: e2d75002 l.sub   r22,r23,r10     r22    = fffff80c  flag: 0
S 0000017c: e2b65802 l.sub   r21,r22,r11     r21    = fffff00d  flag: 0
S 00000180: e2956002 l.sub   r20,r21,r12     r20    = ffffe00e  flag: 0
S 00000184: e2746802 l.sub   r19,r20,r13     r19    = ffffc00f  flag: 0
S 00000188: e2537002 l.sub   r18,r19,r14     r18    = ffff8010  flag: 0
S 0000018c: e2327802 l.sub   r17,r18,r15     r17    = ffff0011  flag: 0
S 00000190: e2118002 l.sub   r16,r17,r16     r16    = ffff0012  flag: 0
S 00000194: 18600000 l.movhi r3,0x0000       r3     = 00000000  flag: 0
S 00000198: 15000001 l.nop   0x0001                         flag: 0
exit(0x00000000);

Is this also a work in progress? (BTW, I opened an issue on that repo with some typos I found in the readme. I'm mentioning it here since you said you were having issues with GH notifications.)

zeldin commented 1 month ago

@stffrdhrn Hm, actually, never mind, it looks like I misinterpreted the "Illegal Wishbone B3 cycle type" as a fatal error. It does run correctly. Do consider fixing the typos in your README when you have the time though. :smile:

stffrdhrn commented 1 month ago

Thanks, yes that illegal wishbone cycle is something that is a bit misleading. Everything does look to work right as per the log.

I'll look at the readme typos

zeldin commented 1 month ago

@stffrdhrn Working on the FASTCONTEXTS implementation, I found an inconsistency in the OpenRISC 1000 Architecture Manual:

Section 6.3 says that CID is incremented on exception. But section 6.4.2 says that exceptions switch to the main context (CID=0).

Either way works for me, but do you happen to know which way it's supposed to be?

stffrdhrn commented 1 month ago

This is interesting, I looked into it there are two different things here. The SR[CID] 4-bits whick refer to the current context. Then there is the CXR 32-bit register which also had details of the context.

The thing is the CXR register doesn't seem to be an spr and I don't know if our assembler even supports this name. But also this CXR is supposed to be used for manual context switching. It's not used when automatic fast switching is enabled.

zeldin commented 1 month ago

But also this CXR is supposed to be used for manual context switching. It's not used when automatic fast switching is enabled.

Yes, however automatic fast switching is supposed to write the previous value of CXR[CCID] (which, AFAICT, is identical to SR[CID]) into CXR[CCRS], so that you can find the previous context in case you need to look at its registers (in the case of a syscall, for example). This is clearly only needed if CID is set to 0 instread of incrementing, otherwise you can just decrement it to find the old value. But even then, can't you just use ESR[CID] to find out? It seems to me that CXR is completely redundant, which might be why it seems nobody has remembered to assign it an spr number...

zeldin commented 1 month ago

Just wanted to add that I did make a test software. All of it, including main and exception stack, fits in 2 Kbyte of BRAM (the vector table at address 0x0000-0x1fff is pure combinatorial and not backed by BRAM or registers). But now I need to actually implement FCS for it to work properly (if you run it now there is some register contention between the tick timer handler and the main loop, giving incorrect output) and I can't do that without some clarity on which way CID is supposed to move on an exception when SR[CE] is set...