lkolbly / ripstop

Apache License 2.0
0 stars 0 forks source link

Macros #9

Open lkolbly opened 2 years ago

lkolbly commented 2 years ago

It would be nice if there were some amount of code-generation capability - unfortunately, Ripstop is not well suited for many of the operations that one would want to do with macros (by definition, they're things the language isn't great at...). Rust, for example, has its proc macros be Rust functions which take tokens and return tokens. A Ripstop module that "takes tokens" would be a disaster.

Here I will list some use cases, and some possible resolutions.

Use cases

Use case: Fixed-point multiplication

For example, multiplication is expensive - but if you're multiplying by a constant, you can frequently optimize the code. To multiply by 3.58, you could write:

// Input & output are 16.16 format
module mult_358(bits<32> input) -> (bits<32> output) {
   output[t] = (input[t] << 1) + input[t] + (input[t] >> 1) + (input[t] >> 4) + (input[t] >> 6);
}

it's inconvenient and error-prone to force the user to generate the above formula. So, ideally, the user would write:

output[t] = input[t] * 3.58;

We don't have a notion of a multiplication operator or of floating points, and this is a niche thing to build into the language, so we want some sort of syntax that allows users to write their own algorithms to generate these things. To steal syntax from Rust macros, we want something like:

output[t] = fxp_const_multiply!(input[t], 3.58);

Use case: Embed git hash

Frequently it's useful to embed some sort of build version indicator into the FPGA, for example so that code interacting with it can verify that a version is correct. We want something like:

module version() -> (bits<160> githash) {
   githash[t] = githash!(); // Macro expands to a constant like 160'habcd1234
}

Use case: Embed file

Similarly to the git hash case, a user may want to embed a file. For example, to make a look-up table for some operation you might write:

module curve_lookup(bits<5> x) -> (bits<16> y) {
   if x[t] == 0 {
      y[t] = 0;
   } else if x[t] == 1 {
      y[t] = 2;
   } else if x[t] == 2 {
      y[t] = 4;
   } else if x[t] == 3 {
      y[t] = 17;
   } else if x[t] == 4 {
      ...
   }
}

but this is laborious to write. However, if the curve is in a CSV file like this:

0
2
4
17
...

then this code is much nicer:

module curve_lookup(bits<5> x) -> (bits<16> y) {
   // csv_lookup knows that `x` is of type `bits<5>`, and so knows that it can take on values
   // 0-31. If the CSV does not have exactly 32 rows, it throws a compile error.
   y[t] = csv_lookup!("mydataset.csv", x);
}

Generate AXI slave

(depending on how the language turns out) working with AXI might be a bit of a pain (you have to write a state machine, store data, etcetera). But, with macros, maybe you could generate an AXI slave with predetermined registers:

module my_axi_peripheral([... AXI signals ...]) -> ([... AXI signals ...]) {
   // Instantiates the appropriate state machine and all registers
   axi_slave_32!(
      0x0 => addr,
      0x4 => data,
      0x8 => go,
   );
   // We can use addr[t], data[t], and go[t] here...
}

this has its tricky bits, for example if we wanted the code afterwards to be able to modify the go register (it's common for a lot of MCU peripherals to have registers that are writable both over the bus and by the peripheral. e.g. status registers). Also, if we can solve this problem via more in-language design, that would be better. For example:

module my_axi_peripheral([... AXI signals ...]) -> ([... AXI signals ...]) {
   instantiate axi_slave_32 as axi;
   bits<32> addr_reg, data_reg, go_reg;
   match axi.reg_write() {
      Some((0x0, data)) => {
         addr_reg[t] = data;
      }
      Some((0x4, data)) => {
         data_reg[t] = data;
      }
      Some((0x8, data)) => {
         go_reg[t] = data;
      }
   }
}

Regex matcher

Regular expressions compile into state machines, and for some things it would be very simple to create a streaming state machine. For example, the following regex:

"field1":\s*"(.*?)"

could be encoded as a state machine like:

module matcher(bits<8> input, bit input_valid) -> (bits<8> output, bit output_valid) {
    bits<32> state;
    if input_valid[t] {
        if state[t-1] == 0 {
            if input[t] == '"' {
                state[t] = 1;
            }
        } else if state[t-1] == 1 {
            if input[t] == 'f' {
                state[t] = 2;
            } else {
                state[t] = 0;
            }
        } else if state[t-1] == 2 {
            if input[t] == 'i' {
                state[t] = 3;
            } else {
                state[t] = 0;
            }
        }
        ... continued for a while ...
        } else if state[t-1] == 9 {
            if input[t] == ' ' {
                state[t] = 9;
            } else if input[t] == '"' {
                state[t] = 10;
            } else {
                state[t] = 0;
            }
        } else if state[t-1] == 10 {
            if input[t] == '"' {
                state[t] = 0;
                output_valid[t] = 0;
            } else {
                output[t] = input[t];
                output_valid[t] = 1;
            }
        }
    } else {
        output_valid[t] = 0;
    }
}

Of course, encoding it by hand is tedious (and, therefore, error-prone). A macro would be able to take a (subset of) regex and turn it into modules at the global scope, like this:

make_regex_matcher!(
    matcher, // module name
    "\"field1\":\\s*\"(.*?)\""
);

Instantiate special vendor IP

So, at least Xilinx's tools will let you infer their special hard blocks by doing incantations in Verilog. For example, this is the incantation to infer a DSP48E2 block:

DSP48E2 #(
// Feature Control Attributes: Data Path Selection
.AMULTSEL("A"), // Selects A input to multiplier (A, AD)
.A_INPUT("DIRECT"), // Selects A input source, "DIRECT" (A port) or "CASCADE" (ACIN port)
.BMULTSEL("B"), // Selects B input to multiplier (B, AD)
.B_INPUT("DIRECT"), // Selects B input source, "DIRECT" (B port) or "CASCADE" (BCIN port)
.PREADDINSEL("A"), // Selects input to preadder (A, B)
.RND(48’h000000000000), // Rounding Constant
.USE_MULT("MULTIPLY"), // Select multiplier usage (MULTIPLY, DYNAMIC, NONE)
.USE_SIMD("ONE48"), // SIMD selection (ONE48, FOUR12, TWO24)
.USE_WIDEXOR("FALSE"), // Use the Wide XOR function (FALSE, TRUE)
.XORSIMD("XOR24_48_96"), // Mode of operation for the Wide XOR (XOR24_48_96, XOR12)
// Pattern Detector Attributes: Pattern Detection Configuration
.AUTORESET_PATDET("NO_RESET"), // NO_RESET, RESET_MATCH, RESET_NOT_MATCH
.AUTORESET_PRIORITY("RESET"), // Priority of AUTORESET vs.CEP (RESET, CEP).
.MASK(48’h3fffffffffff), // 48-bit mask value for pattern detect (1=ignore)
.PATTERN(48’h000000000000), // 48-bit pattern match for pattern detect
.SEL_MASK("MASK"), // MASK, C, ROUNDING_MODE1, ROUNDING_MODE2
.SEL_PATTERN("PATTERN"), // Select pattern value (PATTERN, C)
.USE_PATTERN_DETECT("NO_PATDET"), // Enable pattern detect (NO_PATDET, PATDET)
// Programmable Inversion Attributes: Specifies built-in programmable inversion on specific pins
.IS_ALUMODE_INVERTED(4’b0000), // Optional inversion for ALUMODE
.IS_CARRYIN_INVERTED(1’b0), // Optional inversion for CARRYIN
.IS_CLK_INVERTED(1’b0), // Optional inversion for CLK
.IS_INMODE_INVERTED(5’b00000), // Optional inversion for INMODE
.IS_OPMODE_INVERTED(9’b000000000), // Optional inversion for OPMODE
.IS_RSTALLCARRYIN_INVERTED(1’b0), // Optional inversion for RSTALLCARRYIN
.IS_RSTALUMODE_INVERTED(1’b0), // Optional inversion for RSTALUMODE
.IS_RSTA_INVERTED(1’b0), // Optional inversion for RSTA
.IS_RSTB_INVERTED(1’b0), // Optional inversion for RSTB
.IS_RSTCTRL_INVERTED(1’b0), // Optional inversion for RSTCTRL
.IS_RSTC_INVERTED(1’b0), // Optional inversion for RSTC
.IS_RSTD_INVERTED(1’b0), // Optional inversion for RSTD
.IS_RSTINMODE_INVERTED(1’b0), // Optional inversion for RSTINMODE
.IS_RSTM_INVERTED(1’b0), // Optional inversion for RSTM
.IS_RSTP_INVERTED(1’b0), // Optional inversion for RSTP
// Register Control Attributes: Pipeline Register Configuration
.ACASCREG(1), // Number of pipeline stages between A/ACIN and ACOUT (1-2)
.ADREG(1), // Pipeline stages for pre-adder (1-0)
.ALUMODEREG(1), // Pipeline stages for ALUMODE (1-0)
.AREG(1), // Pipeline stages for A (1-2)
.BCASCREG(1), // Number of pipeline stages between B/BCIN and BCOUT (1-2)
.BREG(1), // Pipeline stages for B (1-2)
.CARRYINREG(1), // Pipeline stages for CARRYIN (1-0)
.CARRYINSELREG(1), // Pipeline stages for CARRYINSEL (1-0)
.CREG(1), // Pipeline stages for C (1-0)
.DREG(1), // Pipeline stages for D (1-0)
.INMODEREG(1), // Pipeline stages for INMODE (1-0)
.MREG(1), // Multiplier pipeline stages (1-0)
.OPMODEREG(1), // Pipeline stages for OPMODE (1-0)
.PREG(1) // Number of pipeline stages for P (1-0)
)
DSP48E2_inst (
// Cascade: 30-bit (each) output: Cascade Ports
.ACOUT(ACOUT), // 30-bit output: A port cascade
.BCOUT(BCOUT), // 18-bit output: B cascade
.CARRYCASCOUT(CARRYCASCOUT), // 1-bit output: Cascade carry
.MULTSIGNOUT(MULTSIGNOUT), // 1-bit output: Multiplier sign cascade
.PCOUT(PCOUT), // 48-bit output: Cascade output
// Control: 1-bit (each) output: Control Inputs/Status Bits
.OVERFLOW(OVERFLOW), // 1-bit output: Overflow in add/acc
.PATTERNBDETECT(PATTERNBDETECT), // 1-bit output: Pattern bar detect
.PATTERNDETECT(PATTERNDETECT), // 1-bit output: Pattern detect
.UNDERFLOW(UNDERFLOW), // 1-bit output: Underflow in add/acc
// Data: 4-bit (each) output: Data Ports
UltraScale Architecture Libraries Guide www.xilinx.com 71
UG974 (v2014.1) April 2, 2014Send Feedback
Chapter 3: Design Elements
.CARRYOUT(CARRYOUT), // 4-bit output: Carry
.P(P), // 48-bit output: Primary data
.XOROUT(XOROUT), // 8-bit output: XOR data
// Cascade: 30-bit (each) input: Cascade Ports
.ACIN(ACIN), // 30-bit input: A cascade data
.BCIN(BCIN), // 18-bit input: B cascade
.CARRYCASCIN(CARRYCASCIN), // 1-bit input: Cascade carry
.MULTSIGNIN(MULTSIGNIN), // 1-bit input: Multiplier sign cascade
.PCIN(PCIN), // 48-bit input: P cascade
// Control: 4-bit (each) input: Control Inputs/Status Bits
.ALUMODE(ALUMODE), // 4-bit input: ALU control
.CARRYINSEL(CARRYINSEL), // 3-bit input: Carry select
.CLK(CLK), // 1-bit input: Clock
.INMODE(INMODE), // 5-bit input: INMODE control
.OPMODE(OPMODE), // 9-bit input: Operation mode
.RSTINMODE(RSTINMODE), // 1-bit input: Reset for INMODEREG
// Data: 30-bit (each) input: Data Ports
.A(A), // 30-bit input: A data
.B(B), // 18-bit input: B data
.C(C), // 48-bit input: C data
.CARRYIN(CARRYIN), // 1-bit input: Carry-in
.D(D), // 27-bit input: D data
// Reset/Clock Enable: 1-bit (each) input: Reset/Clock Enable Inputs
.CEA1(CEA1), // 1-bit input: Clock enable for 1st stage AREG
.CEA2(CEA2), // 1-bit input: Clock enable for 2nd stage AREG
.CEAD(CEAD), // 1-bit input: Clock enable for ADREG
.CEALUMODE(CEALUMODE), // 1-bit input: Clock enable for ALUMODE
.CEB1(CEB1), // 1-bit input: Clock enable for 1st stage BREG
.CEB2(CEB2), // 1-bit input: Clock enable for 2nd stage BREG
.CEC(CEC), // 1-bit input: Clock enable for CREG
.CECARRYIN(CECARRYIN), // 1-bit input: Clock enable for CARRYINREG
.CECTRL(CECTRL), // 1-bit input: Clock enable for OPMODEREG and CARRYINSELREG
.CED(CED), // 1-bit input: Clock enable for DREG
.CEINMODE(CEINMODE), // 1-bit input: Clock enable for INMODEREG
.CEM(CEM), // 1-bit input: Clock enable for MREG
.CEP(CEP), // 1-bit input: Clock enable for PREG
.RSTA(RSTA), // 1-bit input: Reset for AREG
.RSTALLCARRYIN(RSTALLCARRYIN), // 1-bit input: Reset for CARRYINREG
.RSTALUMODE(RSTALUMODE), // 1-bit input: Reset for ALUMODEREG
.RSTB(RSTB), // 1-bit input: Reset for BREG
.RSTC(RSTC), // 1-bit input: Reset for CREG
.RSTCTRL(RSTCTRL), // 1-bit input: Reset for OPMODEREG and CARRYINSELREG
.RSTD(RSTD), // 1-bit input: Reset for DREG and ADREG
.RSTM(RSTM), // 1-bit input: Reset for MREG
.RSTP(RSTP) // 1-bit input: Reset for PREG
);

Very long! And very not fun to type out. It'd be much nicer if this were formatted like this:

xilinx::dsp48e2!("dsp48e2_name", "a*b+c", ... some options ...);

module my_module() -> () {
    instantiate dsp48e2_name as my_dsp;
    my_dsp.a[t] = ...;
}

Much nicer to type. Notice the use of namespacing, to indicate that this macro lives in a "xilinx" package.

See https://docs.xilinx.com/v/u/2014.1-English/ug974-vivado-ultrascale-libraries for other examples for UltraScale devices.

RAM instantiation

Similarly to above (with vendor block inferencing), RAM can be instantiated the same way (look at e.g. the RAMB36E2 template in the document above). However, RAM has the additional property that it can refer to files on disk for initial values:

xilinx::ramb36e2!("ram_name", init="my_init_file.txt");

Options

Python pros:

Python cons:

Lua pros:

Lua cons:

Tcl pros:

Tcl cons:

WASM pros:

WASM cons: