stnolting / neorv32

:desktop_computer: A small, customizable and extensible MCU-class 32-bit RISC-V soft-core CPU and microcontroller-like SoC written in platform-independent VHDL.
https://neorv32.org
BSD 3-Clause "New" or "Revised" License
1.61k stars 227 forks source link

ProASIC3 Starter Kit Port of NEORV32 #2

Closed salmansheikh closed 3 years ago

salmansheikh commented 4 years ago

I got the design programmed into a MicroSemi A3PE-Starter-Kit board but nothing was showing on the UART and then I realized when I ran synplify in LiberoSOC v11.9, it said it would only run at ~ 31MHz and now when I ran SmartTime its showing a clock of 26.076MHz. That explains why nothing is working. So, one choice is to put in a PLL and take 40MHz clock from oscillator on the board going to FPGA and drop to 25MHz and use that for the clock. But what can I do to the design to speed it up. I know the ProASIC3 1.5M gate is >> than a Lattice FPGA. Maybe not as fast as the Xilinx Arty running at 100MHz. Could it be the RAM modules or some other parts I should run through the IP Catalog to generate more optimized area/timing components for the ProASIC3?

proasic3

stnolting commented 4 years ago

I'm not really familiar with Microsemi. If your FPGA is a low-power architecture similar to the Lattice iCE40 family, then 25MHz might be quite "fast". I am using a Lattice iCE40 UltraPlus and currently the maximum frequency for the NEORV32 is somewhere around 24 MHz.

Anyway, you should check the synthesis results for the critical path. Maybe Libero has a problems with mapping the register file or the internal memories.

salmansheikh commented 4 years ago

I will also try on a Xilinx VC707 board I have. But if it works at > 24MHz, I am going to push to use it over a ColdFire V1 core we bought (at work) that is giving me issues. Might put a NEORV32 in space ;)

It is low power but the LVDS can run to 350MHz and 66MHz 64-bit PCI can be implemented so, I think it should go a little faster than the Lattice. I am using a board with an A3PE1500 with 1.5M gates.

proasic3e

stnolting commented 4 years ago

So it is a low-power FPGA and from what I have seen it only provides 3-input LUTs - so you need more levels of logic for each combinatorial function. Also, there is no dedicated carry logic which will slow down large arithmetic circuits. What does the timing report say? Can you figure out where the critical path is?

salmansheikh commented 4 years ago

okay, another question. My design says its achieving 29MHz clock rate. So, I have a 40MHz on my dev board and our final design is supposed to be 24MHz system clock. I used a PLL to take the 40Mhz oscillator to 24MHz and use that to drive the neorv32 which should be fine. Do I make CLOCK_FREQUENCY 40M or 25MHz. The 25MHz is going to the logic, 40MHz only to the PLL. I don't see CLOCK_FREQUENCY used except forsysinfo_mem(0) variable.

stnolting commented 4 years ago

The CLOCK_FREQUENCY generic is used to pass the actual operating frequency of the processor setup (clk_i signal) to the software. An application can determine the actual clock speed via the SYSINFO's SYSINFO_CLK register.

For the hardware, the CLOCK_FREQUENCY generic is irrelevant. But the default bootloader uses this generic to configure the UART baud rate for the actually used clock frequency.

I'm using this approach to have a bootloader, that works independently of the actual hardware setup (including the actual clock speed). If the clock speed was defined directly in the bootloader's source code, one would have to recompile it every time the system uses a different clock speed than my default setup.

salmansheikh commented 4 years ago

But I have 2 clocks, a 40MHz dev board clock coming into the FPGA and because of the 3-input LUTs issue design won't run over 29MHz or so, so I made a PLL that takes 40MHz and outputs 24MHz. Should CLOCK_FREQUENCY be 24000000? I assume so. Still seeing nothing on uart (pins are right, default baud of 19200) nothing happening and FTDI uart is fine (did loopback of RX and TX to verify that it isn't the issue).

On Wed, Oct 7, 2020 at 10:48 AM Stephan notifications@github.com wrote:

The CLOCK_FREQUENCY generic is used to pass the actual operating frequency of the processor setup (clk_i signal) to the software. An application can determine the actual clock speed via the SYSINFO's SYSINFO_CLK https://stnolting.github.io/neorv32/neorv32_8h.html#ace0f30d1fb10e945b71a24511756072e register.

For the hardware, the CLOCK_FREQUENCY generic is irrelevant. But the default bootloader uses this generic to configure the UART baud rate for the actually used clock frequency.

I'm using this approach to have a bootloader, that works independently of the actual hardware setup (including the actual clock speed). If the clock speed was defined directly in the bootloader's source code, one would have to recompile it every time the system uses a different clock speed than my default setup.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stnolting/neorv32/issues/2#issuecomment-704987542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFYMEUIXUJACCQTITT6U3SJR5SBANCNFSM4SB5BYZA .

-- Even a Smile is charity :)

stnolting commented 4 years ago

If the 24 MHz signal is connected to the processor's clk_i signal then CLOCK_FREQUENCY should be 24000000.

What configuration are you using for the processor (generics)? If the bootloader is enabled, you should see a blinking light when connecting an LED to pin 0 of the gpio_o port.

salmansheikh commented 4 years ago

I think I disconnected the GPIO. I hacked the top level file (Not the template) and used this:

I probably should use the template design. Let me try that instead and see if the LED blinks.

On Wed, Oct 7, 2020 at 4:51 PM Stephan notifications@github.com wrote:

If the 24 MHz signal is connected to the processor's clk_i signal then CLOCK_FREQUENCY should be 24000000.

What configuration are you using for the processor (generics)? If the bootloader is enabled, you should see a blinking light when connecting an LED to pin 0 of the gpio_o port.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stnolting/neorv32/issues/2#issuecomment-705186764, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFYMF7ZT4XAJTAKZAMMWDSJTIGFANCNFSM4SB5BYZA .

-- Even a Smile is charity :)

stnolting commented 4 years ago

grafik

I think there is something missing in your last post...?! 🤔

salmansheikh commented 4 years ago

I see the attachment in my gmail. Perhaps, it got blocked. I will send via github.

On Tue, Oct 13, 2020 at 11:23 AM Stephan notifications@github.com wrote:

[image: grafik] https://user-images.githubusercontent.com/22944758/95881192-9e6b4980-0d78-11eb-8187-1933af5b4337.png

I think there is something missing in your last post...?! 🤔

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stnolting/neorv32/issues/2#issuecomment-707817167, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFYMBYWBWCIHS7DC5GSNDSKRWHRANCNFSM4SB5BYZA .

-- Even a Smile is charity :)

  • Prophet Muhammad
stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

salmansheikh commented 2 years ago

I finally got around to porting the neorv32 to the ProASIC board (Microsemi A3PE-STARTER-KIT REV A). I got the bootloader up but the neorv32_exe.bin files I am uploading fail. I am generating them in WSL Ubuntu 20.04 and then copying them to my Windows host and using Tera Term VT to send the files but get an ERROR_0 error. It shouldn't matter that the tools are in linux and I am uploading the binaries in windows, right?

neorv32_proasic3 neorv32_pro

stnolting commented 2 years ago

Great to hear! :+1:

It shouldn't matter that the tools are in linux and I am uploading the binaries in windows, right?

Right, that should not be a problem. I use the same setup without problems.

but get an ERROR_0

Seems like there is a problem with the executable itself. Which program have you compiled?

By the way: I highly encourage you to update to the recent version of the processor. Version 1.4.3.3 is more than a year old and still had a lot of bugs. :wink:

salmansheikh commented 2 years ago

okay, i will update. Its probably as old as the last time i posted. I have the new one downloaded on Linux WSL but I compiled this one on windows because LiberoSoC v11.9 I used to compile it works on windows but never got it working with Linux Ubuntu. I think it works with RedHat/Fedora officially. I will download the newest files onto windows and recompile. It was the hello_world program and after that I tried the hardware_info programs. Both failed. Let me download the newest stuff. Then I will eventually work on the memory interfaces (have to learn how to use wishbone bus) for the daughter cards (I have two of them) the first one with 2MB of SRAM and the top one with 2MB MRAM and 32KB EEPROM. Finally, I have 2 custom IP blocks (microsequencer and NAND flash) that I want to add to the system. The board you saw was actually using a ColdFire V1 processor that we had gotten many years ago a purchased IP but we have to use CodeWarrior 6.3 and WinXP on a VM to run the software...because it has legacy. I want to prove we can do the same with a RISC-V for future projects...

On Wed, Jan 12, 2022 at 10:56 PM stnolting @.***> wrote:

Great to hear! 👍

It shouldn't matter that the tools are in linux and I am uploading the binaries in windows, right?

Right, that should not be a problem. I use the same setup without problems.

but get an ERROR_0

Seems like there is a problem with the executable itself. Which program have you compiled?

By the way: I highly encourage you to update to the recent version of the processor. Version 1.4.3.3 is more than a year old and still had a lot of bugs. 😉

— Reply to this email directly, view it on GitHub https://github.com/stnolting/neorv32/issues/2#issuecomment-1011770727, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFYMENXARXECNSETJTBTLUVZEODANCNFSM4SB5BYZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

-- Even a Smile is charity :)

stnolting commented 2 years ago

okay, i will update.

:+1: Get in touch if you have any compatibility problems.

Then I will eventually work on the memory interfaces (have to learn how to use wishbone bus) for the daughter cards (I have two of them) the first one with 2MB of SRAM and the top one with 2MB MRAM and 32KB EEPROM

If you (some of) these memories have a serial interface you can use the processor's :books: SPI module to connect them. Furthermore, the latest version of the processor also contains an :books: execute in place (XIP) module (via SPI) that allows to use a serial flash for direct code execution.

Finally, I have 2 custom IP blocks (microsequencer and NAND flash) that I want to add to the system

You could use the processor's :books: custom functions subsystem (CFS) for that. Basically, this subsystem is a blank tightly-coupled module that can be used to implement custom co-processors and interfaces.

I want to prove we can do the same with a RISC-V for future projects...

Sounds like an interesting project! :+1:

salmansheikh commented 2 years ago

I started the minimal synthesis last night and found it took 3 hrs 24 minutes and 3M core cells even though the device only has 35K cells. Something obviously went wrong with Synplify Pro and its interpreting of the design that only has GPIO, UART and PWM.

Before I downloaded the latest version, the synplify tool kept giving issues with the sda_data_io and sda_clk_io even though I wasn't using them. Kept saying can't be constants. I ended up having to create internal signals in the top level not connected for those errors to go away. I suspect some how the tool is not optimizing away many inputs (unconnected) and keeping some others in the lower level...synplify pro is pretty recent but must have some issue with the coding style ..I like it but need to figure out what is causing this..

[image: InkedneoRV32_LI.jpg]

[image: minimal_my_foot.png]

[image: gates_look.png]

On Wed, Jan 12, 2022 at 11:19 PM stnolting @.***> wrote:

okay, i will update.

👍 Get in touch if you have any compatibility problems.

Then I will eventually work on the memory interfaces (have to learn how to use wishbone bus) for the daughter cards (I have two of them) the first one with 2MB of SRAM and the top one with 2MB MRAM and 32KB EEPROM

If you (some of) these memories have a serial interface you can use the processor's 📚 SPI https://stnolting.github.io/neorv32/#_serial_peripheral_interface_controller_spi module to connect them. Furthermore, the latest version of the processor also contains an 📚 execute in place (XIP) https://stnolting.github.io/neorv32/#_execute_in_place_module_xip module (via SPI) that allows to use a serial flash for direct code execution.

Finally, I have 2 custom IP blocks (microsequencer and NAND flash) that I want to add to the system

You could use the processor's 📚 custom functions subsystem (CFS) https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs for that. Basically, this subsystem is a blank tightly-coupled module that can be used to implement custom co-processors and interfaces.

I want to prove we can do the same with a RISC-V for future projects...

Sounds like an interesting project! 👍

— Reply to this email directly, view it on GitHub https://github.com/stnolting/neorv32/issues/2#issuecomment-1011782653, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFYMG2IQ6B63ZHJFFMX73UVZHGRANCNFSM4SB5BYZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

-- Even a Smile is charity :)

stnolting commented 2 years ago

I started the minimal synthesis last night and found it took 3 hrs 24 minutes and 3M core cells even though the device only has 35K cells.

I know a similar behavior from Intel Quartus when I synthesize a design, which uses more memory than there is available in the FPGA (the tool tries to build memories from LUT+FF when the BRAM resources are exhausted). So what sizes of IMEM and DMEM did you configure?

Before I downloaded the latest version, the synplify tool kept giving issues with the sda_data_io and sda_clk_io even though I wasn't using them. Kept saying can't be constants.

So this issues is resolved now that you updated to latest version??

[image: InkedneoRV32_LI.jpg] [image: minimal_my_foot.png] [image: gates_look.png]

I think you cannot attach files when responding via email 🤔

salmansheikh commented 2 years ago

Not sure if sda_data/clk_io signals was resolved in latest version. I think I copied the changes to the top level of instantiating those signals from the old version into the new. I will try removing them to see if its still an issue. These inputs on the rtl_gates graphic below in Synplify are showing like a 64 bit gpio_i going into the design. If its truly optimized away, they shouldn't show.

I did 64K default for both memories. I don't think its the memory. Only 36 of 60 of the Block RAMS. Lots of core IO cells.

Target Part: A3PE1500_PQFP208_STD Report for cell neorv32_ProcessorTop_MinimalBoot.neorv32_processortop_minimalboot_rtl Core Cell usage: cell count area count*area AND2 739 1.0 739.0 AND2A 17 1.0 17.0 AND3 163 1.0 163.0 AND3A 1 1.0 1.0 AO1 874 1.0 874.0 AO13 34 1.0 34.0 AO18 15 1.0 15.0 AO1A 163 1.0 163.0 AO1B 6 1.0 6.0 AO1C 64 1.0 64.0 AO1D 12 1.0 12.0 AOI1 88 1.0 88.0 AOI1A 24 1.0 24.0 AOI1B 46 1.0 46.0 AX1 64 1.0 64.0 AX1A 1 1.0 1.0 AX1B 12 1.0 12.0 AX1C 20 1.0 20.0 AX1D 2 1.0 2.0 AX1E 29 1.0 29.0 AXO3 1 1.0 1.0 BUFF 283 1.0 283.0 CLKINT 5 0.0 0.0 GND 25 0.0 0.0 INV 3 1.0 3.0 MAJ3 25 1.0 25.0 MIN3 23 1.0 23.0 MX2 938492 1.0 938492.0 MX2A 481 1.0 481.0 MX2B 60 1.0 60.0 MX2C 493 1.0 493.0 NOR2 398 1.0 398.0 NOR2A 5682 1.0 5682.0 NOR2B 44740 1.0 44740.0 NOR3 83 1.0 83.0 NOR3A 622 1.0 622.0 NOR3B 15037 1.0 15037.0 NOR3C 65562 1.0 65562.0 OA1 280 1.0 280.0 OA1A 100 1.0 100.0 OA1B 81 1.0 81.0 OA1C 23 1.0 23.0 OAI1 14 1.0 14.0 OR2 627 1.0 627.0 OR2A 661 1.0 661.0 OR2B 998 1.0 998.0 OR3 1322 1.0 1322.0 OR3A 36 1.0 36.0 OR3B 155 1.0 155.0 OR3C 316 1.0 316.0 PLL 1 0.0 0.0 VCC 25 0.0 0.0 XA1 33 1.0 33.0 XA1A 17 1.0 17.0 XA1B 2 1.0 2.0 XA1C 2 1.0 2.0 XAI1 3 1.0 3.0 XAI1A 1 1.0 1.0 XNOR2 289 1.0 289.0 XNOR3 34 1.0 34.0 XO1 14 1.0 14.0 XO1A 3 1.0 3.0 XOR2 332 1.0 332.0 XOR3 43 1.0 43.0

          DFN1 29615      1.0    29615.0
        DFN1C0    98      1.0       98.0
        DFN1C1    73      1.0       73.0
        DFN1E0  6813      1.0     6813.0
      DFN1E0C0    41      1.0       41.0
        DFN1E1 919115      1.0   919115.0
      DFN1E1C0    56      1.0       56.0
      DFN1E1P0    26      1.0       26.0
        DFN1P0    12      1.0       12.0
        RAM4K9     4      0.0        0.0
     RAM512X18    32      0.0        0.0
               -----          ----------
         TOTAL 2035686           2035594.0

IO Cell usage: cell count INBUF 3 OUTBUF 8

         TOTAL    11

Core Cells : 2035594 of 38400 (5301%) IO Cells : 11

RAM/ROM Usage Summary Block Rams : 36 of 60 (60%)

minimal_my_foot gates_look InkedneoRV32_LI

stnolting commented 2 years ago

I did 64K default for both memories. I don't think its the memory. Only 36 of 60 of the Block RAMS. Lots of core IO cells.

According to the A3PE1500 datasheet the FPGA contains 270 kBit of RAM - that makes ~33 kByte. So 2x 64kB memories won't fit. Can you try a smaller memory configuration? For example IMEM = 16kB and DMEM = 4kB