Open fredrequin opened 3 years ago
using FPGA specific BRAM is good for completed project is fine, but it's not good for generic core supposed to be used in wide variety of FPGAs. I suggest to replace it with inferred BRAM.
Inferring true dual ported BRAM with byte enables is nearly impossible with Quartus II. That's why I had to use vendor specific BRAM. The generic code is still there and used by Verilator simulation.
it's possible, especially in SV:
module dpram #(ADDRWIDTH=9)
(
input clock,
input [ADDRWIDTH-2:0] address_a,
input [15:0] data_a,
input wren_a,
output reg [15:0] q_a,
input [ADDRWIDTH-1:0] address_b,
input [7:0] data_b,
input wren_b,
output reg [7:0] q_b
);
logic [1:0][7:0] ram[0:(1<<(ADDRWIDTH-1))-1];
always@(posedge clock) begin
if(wren_a) begin
ram[address_a] <= data_a;
q_a <= data_a;
end
else begin
q_a <= ram[address_a];
end
end
always@(posedge clock) begin
if(wren_b) begin
ram[address_b[ADDRWIDTH-1:1]][address_b[0]] <= data_b;
q_b <= data_b;
end
else begin
q_b <= ram[address_b[ADDRWIDTH-1:1]][address_b[0]];
end
end
endmodule
it's not exactly with byte enable, but pretty much similar and easy to modify. Quartus also provides templates (edit->insert template) with similar module.
Been there, done that. With clock enable / byte enable, it does not work : Info: RAM logic "fx68k_soc:U_fx68k_soc|fx68k:U_fx68k|excUnit:excUnit|fx68kRegs:U_fx68kRegs|dpram:U_dpram_B|ram" is uninferred due to asynchronous read logic Info: RAM logic "fx68k_soc:U_fx68k_soc|fx68k:U_fx68k|excUnit:excUnit|fx68kRegs:U_fx68kRegs|dpram:U_dpram_W|ram" is uninferred due to asynchronous read logic Info: RAM logic "fx68k_soc:U_fx68k_soc|fx68k:U_fx68k|excUnit:excUnit|fx68kRegs:U_fx68kRegs|dpram:U_dpram_L|ram" is uninferred due to asynchronous read logic I can even trigger a crash in Quartus II :-) Internal Error: Sub-system: OPT, File: /quartus/synth/opt/opt_ram_resource_aware_st.cpp, Line: 8236 this->m_read_a_during_write_b_behavior == CDB_RAM_READ_OLD_DATA Stack Trace: 0xeee6d: opt_sea_of_mux + 0x4a7cd (SYNTH_OPT) 0xedc84: opt_sea_of_mux + 0x495e4 (SYNTH_OPT) 0xecbad: opt_sea_of_mux + 0x4850d (SYNTH_OPT) 0xec623: opt_sea_of_mux + 0x47f83 (SYNTH_OPT) 0x1ebc06: SMP_ROOT::operator= + 0x3b16 (SYNTH_OPT) 0x1ee9e4: SMP_ROOT::operator= + 0x68f4 (SYNTH_OPT) 0x1ea7d7: SMP_ROOT::operator= + 0x26e7 (SYNTH_OPT) 0x36ba: RTL_ROOT::RTL_ROOT + 0x26ba (SYNTH_OPT) 0x4050: RTL_ROOT::process_sgate_netlist + 0x4e0 (SYNTH_OPT) 0x1369fa: SGN_HSSI_CONVERT::operator= + 0x67c2a (synth_sgn) 0x137e73: SGN_HSSI_CONVERT::operator= + 0x690a3 (synth_sgn) 0xa8403: sgn_qic_helper + 0x9c843 (synth_sgn) 0xab635: sgn_qic_helper + 0x9fa75 (synth_sgn) 0xac040: sgn_qic_helper + 0xa0480 (synth_sgn) 0xae52: sgn_full + 0xc2 (synth_sgn)
0xfd5b: qexe_get_tcl_sub_option + 0xefb (comp_qexe)
0x133a7: qexe_process_cmdline_arguments + 0x5b7 (comp_qexe)
0x134a1: qexe_standard_main + 0x71 (comp_qexe)
0x1a18: msg_exe_fini + 0xf8 (CCL_MSG)
0x19bc: msg_exe_fini + 0x9c (CCL_MSG)
0xa290: operator new[] + 0x130 (ccl_mem)
0x344d4: msg_exe_main + 0x74 (CCL_MSG)
0x13d1: BaseThreadInitThunk + 0x21 (KERNEL32)
0x154f3: RtlUserThreadStart + 0x33 (ntdll)
End-trace
Quartus II 64-Bit Version 9.1 Build 222 10/21/2009 SJ Full Version
Thank you so much for improving this core once more. I have used it in several arcade games already and I really appreciate its quality. About this update:
I agree with @sorgelig. Abandoning generic code for vendor-specific instances seriously limits the reusability of this. Your commit messages suggest that you also fixed 1 or 2 bugs -apart from performance improvements.
I would like to be able to keep those bug fixes without the vendor-specific RAM.
Could you have the vendor-specific code under an ifdef macro statement?
@jotego the pull request hasn't been approved yet. I'd wait.
The commit messages may not be referring to bugs in the original code.
Some changes, such as increasing the address bus width might not be in the spirit of fx68k? I don't think a 32-bit address capable variant of the 68000 was ever available.
Hi everybody,
@jotego @a1exh As far as I can see the bugs are not in the original code.
@fredrequin I need to review the changes before accepting the pull request. Please bear with me for the delay.
But I agree with @sorgelig that vendor specific code should be avoided. If you want, put all the vendor specific stuff in a separate file and ifdef the reference in the main code. The default must be to use generic code, though.
@fredrequin would be good to see the performance and size before and after to judge the difference
Hello all, yes the bugs were my fault while changing the ALU or doing some code clean-up. To me, extending the address bus does not impact the performance or size, if you do not connect the upper address bits, you are back to 16 MB addressing. BTW, there was a very rare 68012 chip with 2GB addressing (Ok, it was 68010 based, so slightly different from a 68000). Performance wise, I was not able to reach 100 MHz with the original core. Now, I can reach 130 MHz. For the size, I went from 5180 LEs to 3170 LEs. Most of the size gain comes from the register file in BRAM. The way the original core was written prevented the use of block RAM for the registers. I am able to read the registers on T3, so I can put a block RAM before the buses MUXes on T4. Registers writes are still done on T4. Given the limited pipelining of the 68000, I assume that there is no concurrent read / write ? If there is some, a bypass mechanism can be added in the register file. The speed gain comes mostly from the ALU : by changing the OP_xxx order so the cases get simplified and by using the same Adder than the J68 core. As many of you suggested, I can put some ifdef for vendor specific code.
I have pushed a new change with generic ROM/RAM and Altera ROM/RAM. The core is even smaller (I have put the microLatch and nanoLatch inside the M4Ks) : 2985 LEs ! This size reduction is a little bit suspicious, I was expecting ~100 LEs less and not 200 LEs.
@fredrequin On exactly which FPGA model and with which Quartus version you got those size and performance results?
I have used Quartus II 64-bit Full v9.1 and the target is a Stratix EP1S40F780C5 (Nios Dev. Board, Stratix Professional Edition).
What is the verdict about this? Could we rescue the Verilator fixes at least?
I have tried compiling with Fredrequin's softcore, and the games hang up. I haven't looked into why they hang up. I assume there is some bug in the modifications done.
No more LINT warnings under Verilator. New core size is less than 3200 LEs. Fmax on a Stratix 1S40 is 130 - 135 MHz