jamesbowman / swapforth

Swapforth is a cross-platform ANS Forth
BSD 3-Clause "New" or "Revised" License
274 stars 55 forks source link

j4a : do..loop not thread-safe #34

Open RGD2 opened 8 years ago

RGD2 commented 8 years ago

Ok, this is one's on me...

I get odd behaviour if another thread (say, $100 io! ) is running a DO ... LOOP and I run another bit of code, also with a do loop...

(this requires a ice40-HX8K Breakout Board be plugged in)

cd j1a
make j4a connect
make -C icestorm j4a
make[1]: Entering directory `/opt/swapforth/j1a/icestorm'
sudo iceprog j4a.bin
init..
cdone: high
reset..
cdone: low
flash ID: 0x20 0xBA 0x16 0x10 0x00 0x00 0x23 0x12 0x42 0x18 0x11 0x00 0x88 0x00 0x44 0x03 0x11 0x11 0xA1 0x14
file size: 135356
erase 64kB sector at 0x000000..
erase 64kB sector at 0x010000..
erase 64kB sector at 0x020000..
programming..
reading..
VERIFY OK
cdone: high
Bye.
make[1]: Leaving directory `/opt/swapforth/j1a/icestorm'
sudo python shell.py -h /dev/ttyUSB1 -p ../common/
Contacting... established
Loaded 207 words 
>:noname 1000 ms ; $100 io!
>: test 5 0 do i . loop ;
>test
 0 1 2 3 4    ok
>test
 0 5982 5983 5984 5985    ok
>test
0 1 2 998 5985    ok

It doesn't always do it - maybe 5% of the time or so - I exaggerated it a bit above.

I think this means the timing for the stack2pipe4 module isn't quite right... strange thing is, the system seems otherwise quite stable in actual applications.

I'll have a go at getting the verilator interactive build of the j4a going, and see if it does it too. (in which case it's just a logic bug.)

Otherwise this might be some form of odd timing or interference issue with the icestorm tools. Perhaps there are some constraints about routing nearby signals which might interfere under certain conditions we don't know about yet? I hope not... But, if so, then it might perhaps share a root cause with #25 .

I have confirmed the above behaviour on both of the boards I have.

jamesbowman commented 8 years ago

Ahh, I actually think this might be my doing. The DO LOOP implementation uses a static variable, "rO" for its offset. With two concurrent DO LOOPs there will be a fight for "rO".

On Sun, May 29, 2016 at 5:59 AM, RGD2 notifications@github.com wrote:

Ok, this is one's on me...

I get odd behaviour if another thread (say, $100 io! ) is running a DO ... LOOP and I run another bit of code, also with a do loop...

(this requires a ice40-HX8K Breakout Board be plugged in)

cd j1a make j4a connect make -C icestorm j4a make[1]: Entering directory /opt/swapforth/j1a/icestorm' sudo iceprog j4a.bin init.. cdone: high reset.. cdone: low flash ID: 0x20 0xBA 0x16 0x10 0x00 0x00 0x23 0x12 0x42 0x18 0x11 0x00 0x88 0x00 0x44 0x03 0x11 0x11 0xA1 0x14 file size: 135356 erase 64kB sector at 0x000000.. erase 64kB sector at 0x010000.. erase 64kB sector at 0x020000.. programming.. reading.. VERIFY OK cdone: high Bye. make[1]: Leaving directory/opt/swapforth/j1a/icestorm' sudo python shell.py -h /dev/ttyUSB1 -p ../common/ Contacting... established Loaded 207 words

:noname 1000 ms ; $100 io! : test 5 0 do i . loop ; test 0 1 2 3 4 ok test 0 5982 5983 5984 5985 ok test 0 1 2 998 5985 ok

It doesn't always do it - maybe 5% of the time or so - I exaggerated it a bit above.

I think this means the timing for the stack2pipe4 module isn't quite right... strange thing is, the system seems otherwise quite stable in actual applications.

I'll have a go at getting the verilator interactive build of the j4a going, and see if it does it too. (in which case it's just a logic bug.)

Otherwise this might be some form of odd timing or interference issue with the icestorm tools. Perhaps there are some constraints about routing nearby signals which might interfere under certain conditions we don't know about yet? I hope not... But, if so, then it might perhaps share a root cause with #25 https://github.com/jamesbowman/swapforth/issues/25 .

I have confirmed the above behaviour on both of the boards I have.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jamesbowman/swapforth/issues/34, or mute the thread https://github.com/notifications/unsubscribe/AA2nK4vUz_fouxuHoFYt2ZyzNc_XYk4mks5qGY3OgaJpZM4IpSQe .

James Bowman http://www.excamera.com/

RGD2 commented 8 years ago

Oh, thank god! Not your fault at all, how you could you have suspected that some crazy loon would later come and make your j1 multicore?

I saw >r and r> and made an assumption. I see now - it's in j1a/nuc.fs

Hmm... I might leave this issue open, until I get a better appreciation for how swapforth interacts with the j4a... (in regards defining words at least).

Off the top of your head, can you think of any other spots that might not be "thread safe" like this? (excepting anything that defines, of course - the idea is never to delegate anything that changes the dict to the other cores.)

RGD2 commented 8 years ago

I think I may just add an additional private context register to the j4a to resolve this (via the io space), I can then just rewrite the do loop i j words to use it. A bit of a kludge, but 'hanging off' the j1a's swapforth implementation is really helping me out a hell of a lot.

Until I get a better feel for swapforth, I'm loath to completely fork away from swapforth/j1a.

RGD2 commented 8 years ago

I stumbled across the hive processor on opencores.org the other day (whilst looking for a 16x16 signed multipliers to poach... found one!) and it's got me thinking.

I wouldn't be unhappy with having a third, very shallow stack around, to solve this exact issue in a way which just extends the cpu context a little.

And it would also help integrate said 32 bit multiplier too, allowing places to source and send data to/from in 32 bit chunks, without necessarily going to a 32 bit internal datapath.

The hive has 8 stacks, which I think is excessive, since I can't see a good way to implement forth on it efficiently (tend to use just 2/8 or 3/8 of them) and it would take a couple clocks to properly update both data and return stacks... :(

It's a really neat, very well documented 'barrel cpu' design though (like j4a) ... hats off to Eric Wallin. If only he hadn't given up so soon on python (and if only he'd met verilator...) oh well. It's still very nice work.

RGD2 commented 7 years ago

FWIW, here's a workaround:

: 750ns dup if begin 1- dup 0= until then drop ; 
: ms dup if begin 1- dup 0= 1332 750ns until then drop ;