jamesbowman / swapforth

Swapforth is a cross-platform ANS Forth
BSD 3-Clause "New" or "Revised" License
275 stars 55 forks source link

How to add new peripheral to the code #40

Closed bmentink closed 6 years ago

bmentink commented 7 years ago

I have written a PWM module that I would like to add to the build, but having trouble interfacing to it when adding to the top level.

I have added the following to the "top" module:

// ########### PWM ###############################################

  wire pwm_wr = io_wr_ & io_addr_[8];

  // Instantiate PWM module 
  pwm #(.CTR_LEN(8)) pwm_1 (
    .rst(1'b1),
    .clk(clk),
    .wr(pwm_wr),
    .compare(8'd80),
    //.compare(dout_[7:0]), this doesn't route
    .pwm(PWM)
  );

I am trying to write to the "compare" input from Forth. The line ".compare(8'd80)" works fine, but when I replace it with the commented line, it synthesises fine, but won't route .. I am using the uart module as an example ... what have I done wrong? ... or have I just run out of room on the FPGA ... (on the 1k device)

EDIT: Seems it was a space issue, routed fine on the 8k device. Looks like I have to buy new hardware ;)

bmentink commented 7 years ago

On the topic of peripherals. Is there a way to so Interrupts on SwapForth?

I would like a peripheral I design in verilog to interrupt SwapForth when data is ready .... I would be great not to have to poll .....

RGD2 commented 7 years ago

You'd have to change swapforth.fs to include space for a fixed-location interrupt vector table, and you'd have to change the j1.v cpu core to include the interrupt unit, which would basically cause a call of the appropriate fixed word in the interrupt vector table. I guess you could implement that table partially in io space as I did for the j4a, but you'd still need to add the interrupt unit with it's own 12 bit address to load when triggered. (so a write to that 12 bit port to the j1aint would make the j1 jump to that address next cycle as if execute had been called).

The downside is that interrupt code can then see / easily break the stacks, and timing regularity goes right out the window.

The kind of jobs an interrupt system is often (ab?)used for (specifically multitasking) is mostly why the j4a design exists (although it requires a hx8k chip). I find the j4a/j1a very consistent timing wise precisely because it can't do interrupts. This means it's not terribly time or power efficient, but it does make it extremely straightforward to program to handle three or four simple tasks which are timed with loops as the timing stays rock steady. This is an important feature for me.

For anything that wants a low latency response (the other thing interrupts are really intended for), I do a FSM in verilog and attach to the j as an IO device.

I'm actually partway through rewriting a self-running peripheral (actually a unidirectional SPI slave port which is written to by an external master with an external asynchronous clock) which I would like to be fifo buffered. (so polling becomes ok: so long as data is consumed quickly enough to not overflow).

At the moment, I just allow the data to be continually overwritten by the fresher data (which is ok, since it's a signal from a pressure sensor). This is carried between the clock domains in with the usual sample/freeze/double register way.

But back to you: What does your PWM module look like?

Did you try compiling it to fit an 8k?

There's a j1a8k variant in there which would be an easy one to try modifying just to check that, since it's just the j1a ported to the hx8k breakout board, with the Makefile lines and .pcf changed accordingly. If it compiles ok, you'll be able to see how big it ends up.

You might find you can reduce the .DEPTH of the return and data stacks a bit to free up space in the hx1k chip, so as to squeeze it in.

-- Remy

bmentink commented 7 years ago

Hi Remy,

Thanks for that reply. I think I will just poll for now.

But back to you: What does your PWM module look like? Did you try compiling it to fit an 8k?

Yes, and it all works just fine. I can adjust the duty and frequency from Forth ..... awesome. (I have the j1a8k breakout board now ... ;)

On another issue (the reason for the question on interrupts) I have an ADC Verilog module I have been working on that uses the LVDS inputs. The module seems to work fine, but I am having issues trying to read the data out with SwapForth .. i,e adding the right code to the top level. I added the following commented line to this section: (My ADC module returns lvds_data)

assign io_din =
    (io_addr_[ 0] ? {8'd0, pmod_in}                                     : 16'd0) |
    (io_addr_[ 1] ? {8'd0, pmod_dir}                                    : 16'd0) |
    (io_addr_[ 2] ? {8'd0, LEDS}                                        : 16'd0) |
    (io_addr_[ 3] ? {13'd0, PIOS}                                       : 16'd0) |
    (io_addr_[ 4] ? {8'd0, hdr1_in}                                     : 16'd0) |
    (io_addr_[ 5] ? {8'd0, hdr1_dir}                                    : 16'd0) |
    (io_addr_[ 6] ? {8'd0, hdr2_in}                                     : 16'd0) |
    (io_addr_[ 7] ? {8'd0, hdr2_dir}                                    : 16'd0) |
    //(io_addr_[ 9] ? {8'd0, lvds_data}                                   : 16'd0) |
    (io_addr_[12] ? {8'd0, uart0_data}                                  : 16'd0) |
    (io_addr_[13] ? {11'd0, random, 1'b0, PIOS_01, uart0_valid, !uart0_busy} : 16'd0);

However, when I synthesis that code I get this error:

make -C icestorm j1a8k
make[1]: Entering directory '/home/bmentink/Builds/swapforth/swapforth/j1a/icestorm'
yosys  -q -p "synth_ice40 -top top -abc2 -blif j1a8k.blif" j1a8k.v uart.v ../verilog/j1.v ../verilog/stack2.v ../verilog/pwm.v ../verilog/adc.v
ERROR: Conflicting init values for signal 1'x (\adc_1.analog_ [30] = 1'0, \_uart0._rx.hh [2] = 1'1).

It doesn't like that line I inserted to have the value lvds_data read out on the Forth side... Am I doing the right thing? lvds_data is defined as: wire [7:0] lvds_data;

bmentink commented 7 years ago

I found my issue. It wasn't that line at all. In my ADC module I had assigned a 32-bit value to analog_which was an 8-bit value. All good now, I can read my ADC value from Forth.

bmentink commented 7 years ago

@RGD2

I am finding that reading the voltages from 1..2v works great, but I cannot measure under 1v, and over 2v goes to 3v3 ...

I thought the Lattice LVDS inputs had common mode nearly to ground ... do you know?

My pin designation is:

module inpin_lvds(
  input clk,
  input pin,
  output rd);

  SB_IO #(.PIN_TYPE(6'b0000_00), .IO_STANDARD("SB_LVDS_INPUT"))  _sio (
        .PACKAGE_PIN(pin),
        .INPUT_CLK(clk),
        .D_IN_0(rd));
endmodule

and I am using it like this:

inpin_lvds _lvds(.clk(clk), .pin(ADC), .rd(input_lvds));

Where ADC is C2 pin and input_lvds is what I am using in my ADC module ...

Bernie

RGD2 commented 7 years ago

I don't know - I've not tried to use any of the inputs for analog. Interesting.

You could use an op amp to shift your voltages into that range, or even just a transistor.

Or, by that point, you're probably better off using a comparator with LVDS output and just connecting the logic output of that to the ice40 inputs.

That ought to make it possible to build a reasonably accurate sigma-delta converter, using an LVDS output from the FPGA for the feedback. Keeping all the logic connections differential, length-matched, and close by their complements ought to help reduce the interference to the analog input.

How do you have it working?

On Saturday, 22 October 2016, bmentink notifications@github.com wrote:

@RGD2 https://github.com/RGD2

I am finding that reading the voltages from 1..3v works great, but I cannot measure under 1v. I thought the Lattice LVDS inputs had common mode nearly to ground ... do you know?

Bernie

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jamesbowman/swapforth/issues/40#issuecomment-255502386, or mute the thread https://github.com/notifications/unsubscribe-auth/AO8-GENkHolu4sEQ1rBZjnj_mCQBko-wks5q2XafgaJpZM4KLhSh .

-- Remy

RGD2 commented 7 years ago

That's wierd.... I think perhaps it's worth opening a ticket on yosys about? It seems like it's a strange behaviour to have, and hints perhaps at a yosys bug.

It might also be some odd verilog behavior - like assuming 32 bit vectors if no information otherwise is available. But it really seems like a deeper bug which could in turn be causing other odd issues...

On Saturday, 22 October 2016, bmentink notifications@github.com wrote:

I found my issue. It wasn't that line at all. In my ADC module I had assigned a 32-bit value to analog_which was an 8-bit value, the synthesis did not pick that up, but the route&place did ...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jamesbowman/swapforth/issues/40#issuecomment-255450924, or mute the thread https://github.com/notifications/unsubscribe-auth/AO8-GFR80SMO95aOxxw0xI0_47itHTnXks5q2RhJgaJpZM4KLhSh .

-- Remy

bmentink commented 7 years ago

Hi Remy,

Yes, from my research it seems different FPGA's have differing common mode ranges on the LVDS input pins. It seems the ICE40 FPGA's are not that great in that respect ... other families include zero, and some go rail to rail ...

I have decided to use a standard comparator and use standard FPGA in/out pins ...

How do you have it working?

What do you mean exactly?

RGD2 commented 7 years ago

You're measuring analog voltages on a logic input pin, so presumably you're using some form of either slope conversion or sigma-delta - what does your circuit look like? Are you using feedback or just comparing against a triangle wave?

On Wednesday, 26 October 2016, bmentink notifications@github.com wrote:

Hi Remy,

Yes, from my research it seems different FPGA's have differing common mode ranges on the LVDS input pins. It seems these FPGA's are not that great in that respect ... other families include zero, and some go rail to rail ...

I have decided to use a standard comparator and use standard FPGA in/out pins ...

How do you have it working? What do you mean exactly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jamesbowman/swapforth/issues/40#issuecomment-256140282, or mute the thread https://github.com/notifications/unsubscribe-auth/AO8-GAiuhI1aNP8uGNk9QAjv4-xSs0tQks5q3lDcgaJpZM4KLhSh .

-- Remy

bmentink commented 7 years ago

A picture is worth a thousand words ;) adc_schematic

I have written all the Verilog for the FPGA components. That part all works great, and I can read the analog value back (10bits, but can be anything) in Forth land. The RC Filter is the opposite of the external RC, so makes the decimation linear.

RGD2 commented 7 years ago

Nice diagram, thanks!

What sort of performance are you getting out of it, in terms of ENOB, sample rate and step response?

On Wednesday, 26 October 2016, bmentink notifications@github.com wrote:

A picture is worth a thousand words ;) [image: adc_schematic] https://cloud.githubusercontent.com/assets/10123911/19706904/fce695ec-9b70-11e6-8152-8a1825a29c23.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jamesbowman/swapforth/issues/40#issuecomment-256199070, or mute the thread https://github.com/notifications/unsubscribe-auth/AO8-GHgOhYme65_hpcuxoFxpJjCDVqfdks5q3oVogaJpZM4KLhSh .

-- Remy

bmentink commented 7 years ago

Hi Remy,

  1. Don't know what ENOB is .... if you mean effective-number-of-bits ... then 10, but can be whatever you set.
  2. Sample rate: Well the bit rate with steady DC on the input is 10Mb, so at 10 bits resolution that would make the sample rate 1Mb I guess ....
  3. Step response is excellent. I am measuring the peak of a drum hit on a piezo element. The voltage on the cap faithfully follows the input voltage .... and that is what is being decimated ... however, fastest pulse rise time under those conditions is only 50us .. it follows that ok, havn't tried anything faster yet ..

Question for you. I notice you have a fork of James's code, what functionality have you added in your code .. I am guessing it is work on the j4a ...

RGD2 commented 7 years ago

On Thursday, 27 October 2016, bmentink notifications@github.com wrote:

Hi Remy,

1.

Don't know what ENOB is ....

The "effective number of bits", related to the SNR - roughly speaking, the the top N bits of the word that aren't noisy.

1.

Sample rate: Well the bit rate with steady DC on the input is 10Mb, so at 10 bits resolution that would make the sample rate 1Mb I guess .... 2.

Step response is excellent. I am measuring the peak of a drum hit on a piezo element. The voltrge on the cap faithfully follows the input voltage .... and that is what is being decimated ... however, fastest pulse rise time under those conditions is only 50us .. it follows that ok, havn't tried anything faster yet ..

Cool - but what I would be looking for is the shape on the edge of a square wave, specifically whether/how much it rings, and how long it takes to settle to the new value.

It's related to the "temporal error" if you like.

A lot of ADC's have really quite terrible step responses there, curtesy of trying to keep too much passband flatness and bandwidth. (They tend to both ring and pre-ring, curtesy of a steep brick wall FIR filter, which has an impulse response like a sinc function - all ringing, with about 20% overshoot both ways, but very sharp frequency domain roll-off).

Turns out if you optimise too hard in the frequency domain, you tend to kill time domain performance. TANSTAAFL.

I like to think of the settling time (or, equivalently impulse response width), as the "time error" in a number of samples. This view of the whole A to D converter system includes the effect of filters in both analog and digital domains.

It isn't a Dirac delta unless no filtering is used, and in that case, aliasing tends to inject spurs from all sorts of VHF sources, resulting in a noisy signal very sensitive to interference.

Best you can usually expect is something looking like a truncated Gaussian. If you plot it "zoomed out" enough, it does resemble a Dirac delta with enough samples per pixel (think: like a retina display, and for the same reason: because decimating too early results in crappy edge accuracy. )

1.

Question for you. I notice you have a fork of James's code, what functionality have you added in your code .. I am guessing it is work on the j4a ...

Yep, mostly peripherals and top + ucf work I'm using in a lab control application. I intend to spilt out a few of the more useful changes into some pulls for James... When I have time to.

One of the more useful/interesting things currently in that is a facility to allow presetting a mask for writes to the pin port. This allows multiple threads to set/clear specific bits in the io port (and direction port) without stepping on each other's changes. It also works for reads, and defaults itself to all ones so as to be optional if not used. It's in j4a.v on my j4a-pmod branch.

-- Remy

-- Remy

bmentink commented 7 years ago

One of the more useful/interesting things currently in that is a facility to allow presetting a mask for writes to the pin port. This allows multiple threads to set/clear specific bits in the io port (and direction port) without stepping on each other's changes. It also works for reads, and defaults itself to all ones so as to be optional if not used. It's in j4a.v on my j4a-pmod branch.

Shame I can't use the j4a.v code, as I need the peripheral address bits ;)

I think a priority should be to have "proper" address decoding, you can't add many peripherals with just the current 1-hot scheme that's done at the moment ....

RGD2 commented 7 years ago

I am looking into this, as IO space is getting tight in my application...

It ought to be possible. mem_addr should be able to select din in time, much the same way that the ALU decoding happens. I suspect that there may only be time to decode 4 or so address bits though... and the io strobes already depend on some decoding of the instruction (insn[6:4]==4 or 5) before they're even available.

OTOH, there is space to add perhaps three more strobes alongside the io_wr and io_rd that exist... these could allow more io space.

I'm also thinking about putting DMA in, since the RAM is actually very very underclocked. According to the documentation, I estimate it could reach 293 MHz in the 2048x2 mode that it's in, so it should be possible to derive a faster clock to allow something besides the J core to access it -- at least for the j4a which can tolerate a couple extra cycles of latency for memory access.

The J1 design might have worked without modification on the ice40, if the EBR's were run with a 2x clock advantage against the j1.

At the moment, one of the key differences the j1a has to the original J1 is that 'call' is abused to act as a load. There's an additional bit to the PC to allow overriding instruction decoding on the next cycle as a 'push whole undecoded instruction to stack and return' which is what makes memory reads take two cycles, but allows the j1a to get away without needing the embedded SRAM memory to be able to do two different reads in one cycle apparently like those on the Xilinx FPGA's.

The 13th bit in an ALU instruction word (any starting with 011_) seems to be... unused.

But, any changes to the j1.v core architecture will mean really forking the j1a -- since the nuc.fs/swapforth.fs code also have to change in lockstep.

Mecrisp commented 7 years ago

Yes, interrupts are possible, I implemented a simple timer interrupt. No IRQ block logic on entry so far, but I already have an atomic "eint exit". See my heavily modified variant of Swapforth called "Mecrisp-Ice" or just ask for more. May I kindly ask for the ADC sources ? I would love to include them in my package. Matthias

jamesbowman commented 6 years ago

Is everyone OK to close this issue?

Mecrisp commented 6 years ago

I am fine with closing this issue. Matthias