ujamjar / hardcaml

[Deprecated see github.com/janestreet/hardcaml] Register Transfer Level Hardware Design in OCaml
https://github.com/janestreet/hardcaml
ISC License
119 stars 8 forks source link

Explicit clock/reset binding in simulation #22

Closed xguerin closed 1 year ago

xguerin commented 7 years ago

I will need to use two different clock domains in my design. Is there a way to explicitly instantiate/bind two different clock source in the simulator ?

andrewray commented 7 years ago

Nah, sorry the simulators don't support multiple clocks.

It's doable with a bit of initial circuit analysis, and a revised API.

Back end stuff like vcd generation would also need an update.

On 25 Jan 2017 18:44, "Xavier Guérin" notifications@github.com wrote:

I will need to use two different clock domains in my design. Is there a way to explicitly instantiate/bind two different clock source in the simulator ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ujamjar/hardcaml/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/AFqzI4JEgREp5pTEzt3vDjGOrUjgTRGSks5rV5gMgaJpZM4Lt2Si .

xguerin commented 7 years ago

Ok, thanks. I'll look into that when the time permits.

andrewray commented 7 years ago

I think there is a possibly quite simple, but inefficient way, of implementing this.

The simulator is really 2 steps - the more complex one updates all the combinatorial nodes (sim_cycle_comb) after which newly calculated register inputs are loaded (sim_cycle_seq).

To support multiple clocks, we would need to split sim_cycle_seq in groups according to which clock domain they belong (assuming no silly stuff like clock muxing/gating, I would just use the uid or name of the clock input).

We could then add a function like val sim_cycle_seq_domain : string -> simulator -> unit where the string (or uid or signal or whatever) indicated the clock to update.

You could then cycle a specific clock domain by running sim_cycle_comb then sim_cycle_seq_domain (and possibly a further sim_cycle_comb afterwards, but that's a different discussion!).

The inefficiency comes from the fact that sim_cycle_comb will perform a lot of redundant calculation on stuff that is not relevant to the actual clock domain being cycled. To optimise this would require assigning a (list of) clock domains to each combinatorial node.

andrewray commented 7 years ago

I am going to add support to the simulator for this in the near future.

What are your thoughts on how the testbench should look? I find the current "very imperative" method is fine for driving simple testbenches, but a pain when you have 2 or 3 concurrent "things" happening (ie driving input values while draining a FIFO). Multiple clocks is gonna make this a real pain.

I wonder if we could build an API on top of something like Oleg Kiselyov's delimited continuations (specifically in yield form)? Or perhaps you have some other idea?

xguerin commented 7 years ago

Here are the things I gathered so far concerning the test bench:

The imperative style is adequate

It blends seamlessly with unit testing frameworks and it is very simple to put together and to understand.

I wonder if we could build an API on top of something like Oleg Kiselyov's delimited continuations (specifically in yield form)?

I would need to dive into these delimited continuations to give you an educated answer.

I like to consider the simulator as a very tool to validate the correctness of a given module or system. What I find myself often do to validate said correctness is to match the outputs of said components for a given input at various sequential time points.

In these conditions, it may be interesting to provide a terse way to declare expected outputs at given time points and let the simulator do the rest. Even more interesting if these time points and outputs could be automatically parsed from JSON or SCONS.

Maybe:

module Test = struct
  type 'a event = int * 'a [@@deriving yojson]
  type 'a events = 'a event list [@@deriving yojson]
end

...

let () =
  let some_events = Test.of_yjson "some_file.json" in
  let _, sim, i, o, _ = Sim.make ... in
  Cyclesim.Api.expect sim i o some_events

Some details are lacking (like the input generation, the clock and reset, etc...).

Clock

Clock (and reset) should ideally be explicitly declared outside the simulator. First and foremost to allow explicit "system" signals to be actually driven by an actual clock (and not left de-asserted as it is the case today -- only the inferred clock/reset are driven), but also to allow fine-grain tuning of the clock and reset behavior.

Multiple clocks

This would be very useful but it's non-trivial to implement:

  1. The simulator would nee to support delta-cycles
  2. HardCaml would need to have some clock transfer facilities such that multiple clocks could actually be used

Clock transfer facilities could either be reimplemented entirely (possible, but tough, especially when it comes to handling timing) or mapped to existing construct of the target platform (like dc_fifo in Altera's LPM).

I am very interested in these issues and would love to collaborate deeper on these (especially since most of my design use multiple clocks and several clock transfer facilities).

(Optional) Testing-only source/sink entities

Driving inputs and gathering outputs could be simply addressed by using test-only source/sink component that one could "plug" into the tested interface and that would be driven by simulation. Of course, that implies that both source and sink are clocked, but I don't think that's an issue.

I believe sources can already be implemented with the current framework using an explicit clock and reset combo as an input interface and driving signals as output interface, and a separate module that would use the source component and the actual interface to test. And sinks may not be that useful.

andrewray commented 7 years ago

The imperative style is adequate

I partly agree. Certainly, it's very simple to write quick per-module tests, but I find dynamic testbenches ie where the input to be driven is dependent on the output read, even for simple stuff like a start/done protocol, to be a pain.

None-the-less, I want to keep these quick one-shot testbenches around whatever more complex solution we come up with.

I wonder if we could build an API on top of something like Oleg Kiselyov's delimited continuations (specifically in yield form)? I would need to dive into these delimited continuations to give you an educated answer.

More specifically, I am thinking about how they can implement python style generators. Roughly speaking we would write multiple generators which would set inputs/read outputs and yield which would correspond to the clock update. Each generator would still look much like the current imperative API with Cyclesim.Api.cycle replaced with Cyclesim.Api.yield.

A good look at cocotb may be useful for inspiration here.

In these conditions, it may be interesting to provide a terse way to declare expected outputs at given time points and let the simulator do the rest. Even more interesting if these time points and outputs could be automatically parsed from JSON or SCONS.

Maybe:

module Test = struct type 'a event = int * 'a [@@deriving yojson] type 'a events = 'a event list [@@deriving yojson] end

...

let () = let some_events = Test.of_yjson "somefile.json" in let , sim, i, o, _ = Sim.make ... in Cyclesim.Api.expect sim i o some_events Some details are lacking (like the input generation, the clock and reset, etc...).

Ok. This doesn't seem particularly complex. That said I am more interested dynamic testbenches which react to their output.

but also to allow fine-grain tuning of the clock and reset behavior.

Now that might be a point of contention! Apart from multiple different clocks/resets, which is fine, I am not sure that adding any other logic on these paths is a good idea.

Multiple clocks

This would be very useful but it's non-trivial to implement:

The simulator would nee to support delta-cycles

No, I dont think so. At least tools like verilator and chisels c++ simulator back end can do it without delta cycles.

That said, and event driven simulator backend that did support delta cycles is not conceptually all that tough to do - except maybe the way it starts up and making that match verilog and/or vhdl.

HardCaml would need to have some clock transfer facilities such that multiple clocks could actually be used Clock transfer facilities could either be reimplemented entirely (possible, but tough, especially when it comes to handling timing) or mapped to existing construct of the target platform (like dc_fifo in Altera's LPM).

I am very interested in these issues and would love to collaborate deeper on these (especially since most of my design use multiple clocks and several clock transfer facilities).

Cool. I think stage one of this should be embedding clock domains into the type system so it can avoid issues automatically. I am not sure the way the current API is structured is particularly well for this.

Then we would need some specific clock domain crossing related logic (synchronizers, multi-clock RAMs, and FIFOs). Perhaps we can be careful to ensure the components can map to hardcaml or vendor modules like dcfifo.

xguerin commented 7 years ago

More specifically, I am thinking about how they can implement python style generators. Roughly speaking we would write multiple generators which would set inputs/read outputs and yield which would correspond to the clock update. Each generator would still look much like the current imperative API with Cyclesim.Api.cycle replaced with Cyclesim.Api.yield.

Reading more about this construct, it actually sounds like a good idea. It would certainly address the multiple input sources problem.

However, whose responsibility would it be to advance the clock(s) ? And what about asynchronous clocks ? Explicitly driving source/sink component with clocks and let the simulator drive the clocks (by simply advancing time) would address that issue.

Now that might be a point of contention! Apart from multiple different clocks/resets, which is fine, I am not sure that adding any other logic on these paths is a good idea.

By fine-grain tuning I meant period, delay and, in the case of multiple clocks, phase shift. I don't think that would add any extra logic to the design, just explicitly shape the clock signal.

Ok. This doesn't seem particularly complex. That said I am more interested dynamic testbenches which react to their output.

As in asynchronous feedback loops ? That sounds challenging to deterministically achieve without delta cycles ;)

No, I dont think so. At least tools like verilator and chisels c++ simulator back end can do it without delta cycles.

It depends on the targeted accuracy level. If we introduce support for multiple clocks with phase shifts (e.g. to simulate an RX/TX MAC with unrelated clocks of the same frequency) delta cycles would be useful to simulate clock domain transfers.

except maybe the way it starts up and making that match verilog and/or vhdl.

Yes, that might be challenging as Verilog and VHDL do not process delta cycles the same way (delta cycles being inherent to VHDL to begin with).

andrewray commented 7 years ago

However, whose responsibility would it be to advance the clock(s) ? And what about asynchronous clocks ? Explicitly driving source/sink component with clocks and let the simulator drive the clocks (by simply advancing time) would address that issue.

By fine-grain tuning I meant period, delay and, in the case of multiple clocks, phase shift. I don't think that would add any extra logic to the design, just explicitly shape the clock signal.

The generators would be "passed" to a new run function which would also receive a specification of how the clocks should be set up ie

Cyclesim.Api.run 
  ~generators:[...] 
  ~clocks:[
      "clock_a", clock_a_period, clock_a_phase;
      "clock_b", clock_b_period, clock_b_phase;
    ]
  ~stop:(fun () -> true/false)

(It may be that the generators should be dynamic ie able to appear and disappear during simulation, so they might use a different api than passing to the run function)

The run function would advance time, control the various steps of simulation, and execute the generators.

The stop function would simply return false to end the simulation - it could easily be set up to react to some simulation event (ie a done signal asserting) or just count cycles.

Ok. This doesn't seem particularly complex. That said I am more interested dynamic testbenches which react to their output. As in asynchronous feedback loops ? That sounds challenging to deterministically achieve without delta cycles ;)

Nothing quite so crazy! I just mean I often have a testbench which reacts to the outputs ie I want to run a core 10 times with different parameters. So I toggle the start signal and wait until done goes high. Or I want to use a FIFO so I have to monitor the full/empty signals in the testbench to control what I do.

Picking out the right values at the right time is actually what the next output from the cyclesim stuff is all about.

No, I dont think so. At least tools like verilator and chisels c++ simulator back end can do it without delta cycles. It depends on the targeted accuracy level. If we introduce support for multiple clocks with phase shifts (e.g. to simulate an RX/TX MAC with unrelated clocks of the same frequency) delta cycles would be useful to simulate clock domain transfers.

I just don't think this will be a problem. Delta cycles don't really model clock domain crossing anyway - doing it accurately would need a full register model with setup and hold times and a metastability model, and I really don't want to go there.

I guess the one place where there might be an issue is when different clock domains have a clock edge at the exact same time. Exactly what happens here will depend on which clock gets processed first (presumably as specified by the list given to the run function). This isn't so different to VHDL to be fair, even with delta cycles.

andrewray commented 7 years ago

Here's a little experiment in using Delimcc to implement generators, and how it can be used with the current simulator.

https://gist.github.com/andrewray/44f5f62d172df8cde5b056e23e91e961

(inspiration taken from here)

Two possible extensions are passing the clock to yield on (for multiple clock domains) and I am wondering about allowing the generators to return a list of new generators which start at the next cycle. In this example, it would seem natural to spawn the add/mul drivers from the enable driver.

andrewray commented 7 years ago

A further extension

https://gist.github.com/andrewray/d989a0c7ac9b239e9090b4bdb9e0080d

This allows new tasks to be spawned. Also, the generators are composable, as shown by the delay and seq functions.

xguerin commented 7 years ago

Thanks for that. From the look of it the second extension with composable generators definitely looks like a step in the right direction. I'll need to spend some time on the literature to get a full grasp of the underlying theory :)

xguerin commented 7 years ago

On a side note, S.reset would probably eventually be obsolete: when resets are explicitly exported, S.reset does not drive them.

andrewray commented 7 years ago

I think I can see how this scheme could be extended to multiple clock domains by associating a generator (and any new generators it yields) with a specific clock, then running the appropriate generators when the clock edges occur. That should be a fairly small extension of the current code.

Being fully compatible with the current simulator is a nice property, and it makes sense as a way to extend the current testbench style with some useful features.

There is another scheme which might be better still and that looks much more like vhdl/verilog. At the moment yield is really a way of synchronising all the generators together before running a Sim.cycle. A more flexible scheme would allow yielding on an event - something more like

@(event);
a=1;
b=2
@(event);
...

The events available in the cycle accurate simulator are a bit different to an event driven one, so the concept may or may not make sense.

Good bit more complex to implement and perhaps both schemes have their advantages.

xguerin commented 7 years ago

I read the paper on delimcc. It's pretty cool stuff.

That should be a fairly small extension of the current code.

Yes. I like that approach very much. It would make trivially clear what generators belong to what clock.

The events available in the cycle accurate simulator are a bit different to an event driven one, so the concept may or may not make sense.

I believe in the case of the CA simulator events could be defined as a boolean equation of input signal values. Inputs could be accessed by generators by simply passing the input interface to the gen operator.