olofk / fusesoc

Package manager and build abstraction tool for FPGA/ASIC development
BSD 2-Clause "Simplified" License
1.16k stars 242 forks source link

Support Project Mistral - Altera Cyclone V Bitstream #552

Open somhi opened 2 years ago

somhi commented 2 years ago

I need support for Project Mistral for my chameleon96 and Terasic Sockit Altera Cyclone V boards A blinky example done with project mistral is done here https://github.com/kprasadvnsi/mistral-CV96-blinky

olofk commented 2 years ago

What do you say @ravenslofty. Is Mistral in a position where it's worth adding an Edalize backend?

Ravenslofty commented 2 years ago

nextpnr-mistral is still very experimental due to the lack of M10K support (we're working on it!)

However, I don't expect any of the commands to meaningfully change.

infphyny commented 2 years ago

Does anyone started to work on adding mistral as an Edalize backend? If not, I have time to work on this.

infphyny commented 2 years ago

I am able to generate bitstream with fusesoc with a mistral backend for edalize. Very similar to oxide backend. --compress-rbf option passed as a nextpnr_options. What I miss is the optional call for mistral-cv and test script. Test script will be based also on test_oxide source code. Once test script added, I can make a pull request.

Link: Modified edalize
cv96 blinky

olofk commented 2 years ago

Mistral is merged now. Would be awesome if you could add support in LED to Believe for some board that can use Mistral. Hoping to add SERV support as well eventually once we have memories working

somhi commented 2 years ago

Yes could be great adding Sockit or Chameleon96 Mistral tool in Led to Believe. I'll add it to the to-do list ;)
Thanks @infphyny :)

infphyny commented 2 years ago

Then I will add 2 QMTech cyclone V dev boards. The board with 5CEFA2F23 is officially supported by openFPGALoader.

Ravenslofty commented 2 years ago

I mean, truthfully the only officially supported board is the DE-10 Nano. By which I mean: it's the only one I have which I can test things on :P

On Thu, 17 Feb 2022, 22:48 infphyny, @.***> wrote:

Then I will add 2 QMTech cyclone V dev boards. The board with 5CEFA2F23 is officially supported by openFPGALoader.

— Reply to this email directly, view it on GitHub https://github.com/olofk/fusesoc/issues/552#issuecomment-1043567447, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPDW24TOLM7FHFJ5V7KL3U3V3MXANCNFSM5OFRONKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

infphyny commented 2 years ago

Good, I will try first to add DE-10 Nano.

olofk commented 1 year ago

@somhi @infphyny @Ravenslofty Remind me, did we ever get all the way with this one?

Ravenslofty commented 1 year ago

nextpnr-mistral has had block RAM support for a long while now, but it was always a little flaky, so I've not exposed it on the Yosys side.

While the timings are very roughly correct, I've been waiting on Sarayan to rewrite the Mistral library to expose timing information.

On Sun, 18 Dec 2022, 01:19 Olof Kindgren, @.***> wrote:

@somhi https://github.com/somhi @infphyny https://github.com/infphyny @Ravenslofty https://github.com/Ravenslofty Remind me, did we ever get all the way with this one?

— Reply to this email directly, view it on GitHub https://github.com/olofk/fusesoc/issues/552#issuecomment-1356612436, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPDW7DVTPDIB3LQBMR4YTWNZRDLANCNFSM5OFRONKA . You are receiving this because you were mentioned.Message ID: @.***>

Ravenslofty commented 1 year ago

It's been, heh, quite a while. But recently I've had the energy to rework some bits of Mistral, and thanks to some "bug fixes and performance improvements", nextpnr-mistral...should be capable of a corescore, thanks to initialised M10K support.

I wonder how it will do.

olofk commented 1 year ago

Thanks for the heads-up. So how do we test this? I would suggest starting by switching over an existing Servant target to use mistral. You have previously said that de10_nano is the best supported one. Is that still the case? If so, could you do an initial test? I could help out with the FuseSoC description and potentially even build a bitstream if it is just a matter of building with latest main branches of yosys and nextpnr

Ravenslofty commented 1 year ago

Yes, the DE10-Nano is still the best-supported board by virtue of me having one. I'd be happy to test a bitstream on the board if you send me one. Things should mostly just work with the latest Yosys and nextpnr git versions, though you will need to pass -nodsp to Yosys.

infphyny commented 1 year ago

Very good news, will try to add corescore for de10 nano and QMTech board (5CEFA5F23I7N).
I made a small rgb blinky with value of pwm stored in bram. For QMTech board, when I use 8 M10K nextpnr-mistral is able to generate the bitstream. When I use 16 M10k or more, nextpnr-mistral crash with a std::out_of_range error message. If it doesn't work also for corescore, I will make a repro and debug with gdb to tell in which portion of code nextpnr-mistral generate the error. Thanks.

Ravenslofty commented 1 year ago

Can you get me a gdb backtrace of the error anyway?

infphyny commented 1 year ago

I will compile nextpnr-mistral with debug info and will give you the trace.

infphyny commented 1 year ago

Error seems in nextpnr/common/kernel/hashlib.h:597 nextpnr_mistral::Arch::getBelPinsForCellPin

Thread 1 "nextpnr-mistral" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: Aucun fichier ou dossier de ce type.
(gdb) backtrace
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff583c406 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff582287c in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff5ca4f26 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff5cb6f2c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff5cb6f97 in std::terminate() ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff5cb71f8 in __cxa_throw ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00005555556ddbc6 in nextpnr_mistral::dict<nextpnr_mistral::IdString, nextpnr_mistral::ArchPinInfo, nextpnr_mistral::hash_ops<nextpnr_mistral::IdString> >::at (this=<optimized out>, key=...)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/kernel/hashlib.h:597
#10 0x0000555555745e1d in nextpnr_mistral::Arch::getBelPinsForCellPin (
    this=0x555566ddef50, pin=..., cell_info=<optimized out>)
infphyny commented 1 year ago

With de10 nano got the same error, forgot to press enter to get more of the backtrace. So here is the complete backtrace

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff583c406 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff582287c in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff5ca4f26 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff5cb6f2c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff5cb6f97 in std::terminate() ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff5cb71f8 in __cxa_throw ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00005555556ddbc6 in nextpnr_mistral::dict<nextpnr_mistral::IdString, nextpnr_mistral::ArchPinInfo, nextpnr_mistral::hash_ops<nextpnr_mistral::IdString> >::at (this=<optimized out>, key=...)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/kernel/hashlib.h:597
#10 0x0000555555745e1d in nextpnr_mistral::Arch::getBelPinsForCellPin (
    this=0x555566ddef50, pin=..., cell_info=<optimized out>)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/mistral/arc--Type <RET> for more, q to quit, c to continue without paging--
h.h:446
#11 nextpnr_mistral::Context::predictArcDelay (this=0x555566ddef50, 
    net_info=0x55558c23f4b0, sink=...)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/kernel/context.cc:104
#12 0x00005555557782cf in nextpnr_mistral::TimingAnalyser::get_route_delays (
    this=0x7fffffffc800)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/kernel/timing.cc:145
#13 0x00005555559b7a65 in nextpnr_mistral::TimingAnalyser::run(bool) [clone .constprop.0] (this=0x7fffffffc800, update_route_delays=true)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/kernel/timing.cc:50
#14 0x00005555557bbdaf in nextpnr_mistral::HeAPPlacer::place (
    this=0x7fffffffc680)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/place/placer_heap.cc:281
#15 0x00005555557c7ef0 in nextpnr_mistral::placer_heap (ctx=0x555566ddef50, 
    cfg=...)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/place/placer_heap.cc:1812
#16 0x000055555580718a in nextpnr_mistral::Arch::place (this=0x555566ddef50)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/common/kern--Type <RET> for more, q to quit, c to continue without paging--
el/basectx.h:167
#17 0x000055555574112a in nextpnr_mistral::CommandHandler::executeMain (
    this=this@entry=0x7fffffffd1b0, 
    ctx=std::unique_ptr<nextpnr_mistral::Context> = {...})
    at /usr/include/c++/12/bits/unique_ptr.h:191
#18 0x0000555555741651 in nextpnr_mistral::CommandHandler::exec (
    this=0x7fffffffd1b0) at /usr/include/c++/12/bits/unique_ptr.h:189
#19 0x00005555557119d8 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/stche/Documents/Logiciel/FPGA/toolchain/YosysHQ/nextpnr/mistral/main.cc:100
Ravenslofty commented 1 year ago

Would it be possible to zip up and send me the build directory, or whatever edalize creates?

infphyny commented 1 year ago

Yes I will put on github and give link. Written in vhdl. Will try to run with only one bram on de10 nano board to see if it work.

Ravenslofty commented 1 year ago

I'm not asking for the source, I'm asking for the build directory.

infphyny commented 1 year ago

Great, it works on de10 nano when using one one bram. I don't use edalize, only plain makefile for now.

Ravenslofty commented 1 year ago

Well, I'm looking for the .json file generated by Yosys, the .qsf file and the nextpnr-mistral command line

infphyny commented 1 year ago

Ok will give you link for those files

infphyny commented 1 year ago

issue.zip Tell me if something is missing

infphyny commented 1 year ago

Here is the complete project, just in case. issue.zip

Ravenslofty commented 1 year ago

I found and fixed the issue last night, and then promptly spent the day sleeping.

Anyway, the issue should be fixed, and it compiles on my end with latest nextpnr.

infphyny commented 1 year ago

Thank you, my example work now. I just have to learn how to use a pll with mistral to have a corescore.

Ravenslofty commented 1 year ago

Unfortunately PLLs aren't plumbed into nextpnr, since the vendor primitive is an utter mess and I have the irrational hope I can do better than it...

olofk commented 1 year ago

Ok, so if PLLs aren't supported, then I think it's fine to use the clock input directly. Might be a bit tougher for the router, depending on the input frequency though.

@infphyny, it is probably easiest to use an UART as the emitter, like on the icestick target. (Frankly, that is probably how we should do it for all targets eventually.)

infphyny commented 1 year ago

1 core works. 50 cores works.Build time ~10 minutes. According to nextpnr-mistral, FMax ~35MHz, but works at 50MHz. Trying 256 cores... , still building without errors after ~45min. Corescore with quartus is 271.

Info: Device utilisation:
Info:           MISTRAL_COMB: 66648/83820    79%
Info:             MISTRAL_FF: 65330/167640    38%
Info:             MISTRAL_IO:     3/  472     0%
Info:         MISTRAL_CLKENA:     1/    2    50%
Info:    cyclonev_oscillator:     0/    1     0%
Info:   cyclonev_hps_interface_mpu_general_purpose:     0/    1     0%
Info:           MISTRAL_M10K:   256/  553    46%

Will make a pull request this weekend. Maybe I will not choose the highest possible score to have a working bitstream in 10 to 20 minutes.

olofk commented 1 year ago

Well done everyone! Having Mistral at this level with a CoreScore to prove it is fantastic. I just finished building the tools now, and if you share the de10_nano support, I can run a couple of builds on my side too, to see what CoreScore we can reasonably achieve

Ravenslofty commented 1 year ago

In general Cyclone V seems to be more challenging to route for nextpnr than, say, ECP5.

I think this comes down to each LAB having pretty major Tile Dispatch congestion. In slightly less technical terms: a LAB is made up of 10 ALMs, and each ALM has 8 inputs, but the LAB itself only has 46 inputs from global routing (through the Tile Dispatch muxes), and four of these inputs are reserved by nextpnr for FF control signals.

(I find it very interesting that the 50-core SoC runs at 50MHz despite signing off at 35MHz (assuming you're using the third Fmax number that nextpnr-mistral prints); I have been assured that routing delays from the analogue simulator match Quartus, so, uh, hm.)

Ravenslofty commented 1 year ago

I think both Olof and I would rather you PR the config a bit sooner than the weekend; on my end at least I want to see if there are any obvious things that can be done to improve performance, given, say, a 100-core SoC.

infphyny commented 1 year ago

corescore.zip target is de10_nano_mistral. Need a serial to usb adapter to see output on console. I go to work. If something is missing, I will put missing files tonight. Thanks.

olofk commented 1 year ago

Thanks @infphyny! 50 cores finished without problems. Running 150, 260 and 300 now in parallel. Will keep you posted on the results

Ravenslofty commented 1 year ago

The default config can't actually place 100 cores, due to severe constraints on synchronous-clears (one SCLR per LAB); to get 100 cores to work I need to modify Yosys to not use the dedicated synchronous-clears...

olofk commented 1 year ago

aha. So no need at this point to try with higher numbers then?

Ravenslofty commented 1 year ago

Can I still claim a corescore of 100 if I hacked Yosys? Or do I have to settle for 50?

(This is, for better or worse, just part of the maturing of a toolchain)

olofk commented 1 year ago

I want to have the number that is possible to achieve with upstream tooling, but let's revisit this when the toolchain improves. I think it's already fantastic that we can get a CoreScore at all using Mistral.

One thing that crossed my mind, out of curiousity, would it make any difference to do exchange either yosys or nextpnr to the corresponding Quartus step instead?

Ravenslofty commented 1 year ago

sigh

So, the problem there is primarily memory blocks:

So if a design has memories, Yosys can't communicate the memory contents to Quartus for a Yosys->Quartus flow, and Quartus can't communicate the memory contents to Yosys for a Quartus->nextpnr flow.

On Thu, 1 Jun 2023, 19:10 Olof Kindgren, @.***> wrote:

I want to have the number that is possible to achieve with upstream tooling, but let's revisit this when the toolchain improves. I think it's already fantastic that we can get a CoreScore at all using Mistral.

One thing that crossed my mind, out of curiousity, would it make any difference to do exchange either yosys or nextpnr to the corresponding Quartus step instead?

— Reply to this email directly, view it on GitHub https://github.com/olofk/fusesoc/issues/552#issuecomment-1572554063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPDW3HDV2TQFYGASX24LDXJDLHPANCNFSM5OFRONKA . You are receiving this because you were mentioned.Message ID: @.***>

olofk commented 1 year ago

Ah, I see. So that's where those pesky .mif files come in. Alright. Then I won't spend any effort supporting a mix of Quartus and FOSSi tooling at this point.

olofk commented 1 year ago

A couple of more data points. 84 cores is the maximum I can achieve. 85 cores is stuck in routing. It has done 15000 iterations so far and doesn't look like it will converge. 150 cores is stuck in placer. Attaching the 84-core version if someone with a board wants to check that it works. The archive contains the whole Edalize-generated work dir so you can rebuild it with the makefile if you want too. c84.tar.gz

infphyny commented 1 year ago

Tested corescore_0.rbf inside c84 folder. Corey count 6 cores instead of 84. Will do more test on my side. That's great we can build a riscv soc with mistral.

infphyny commented 1 year ago

Just an update, mistral compute fmax correctly. I have implemented a simple clock divider. Mistral is able to route the divided clock signal to global route. Now soc run at 12.5 MHz and got 70 cores running without issue. Trying to get 84 cores, the maximum Olof have achieved.