westerndigitalcorporation / pyvcd

Python package for writing Value Change Dump (VCD) files.
http://pyvcd.readthedocs.org/
MIT License
106 stars 39 forks source link

Feature request: decoupling identifiers from names #15

Closed whitequark closed 4 years ago

whitequark commented 4 years ago

I'm using pyvcd in nMigen to great effect--thanks!

One unusual thing nMigen does is create a lot of signals with exactly the same value; this is an artifact of how it treats hierarchy. In VCD it is possible to have multiple $vars with the same ident.

Would it possible to provide this functionality in pyvcd?

jpgrayson commented 4 years ago

Hi @whitequark. Glad pyvcd is useful to you! And nMidgen looks like a very cool and very ambitious project.

PyVCD used to support user-provided identifiers, but I removed that capability recently in the 0.2.0 release (d46db597a72bc5f321ca43f84b31e4c7c306f334).

I reviewed IEEE-1364-2005 to determine whether the spec allows multiple variables to map to the same identifier. Sure enough, this is explicitly allowed (see section 18.2.3.7).

So I believe that what you want is valid and that it would be good if PyVCD supported it.

How to get this feature?

The reason I removed the optional ident argument to VCDWriter.register_var() was to eliminate the possibility of unintentional identifier conflicts. I still see that as a worthy goal/feature. However, that closed the door to intentionally shared identifiers.

A possible way to solve this might be to add a register_alias() method to complement the existing register_var() method.

class VCDWriter:
    def register_alias(scope: ScopeInput, name: str, ref_var: Variable) -> 'Variable':
        ...

The idea being that an "alias" variable could be created with the same ident as ref_var. This approach would still allow PyVCD to manage/generate identifiers while also allowing explicit identifier sharing. And this approach would also ensure that aliased variables have the same var_type, size, and init value as the variable they reference, which are important invariants to maintain. change() could be called on either the original reference variable or any of its aliases.

Would this work for your use case?

If so, the implementation seems straightforward. I would get to it eventually, or I'd be happy for a PR.

whitequark commented 4 years ago

Would this work for your use case?

It would work, but I would prefer to also regain the ability to provide my own identifiers. The reason is that VCD writing is currently consuming more than 50% of CPU time during nMigen simulations, and being able to convert unique signal indexes (that I already have as a part of the simulator) directly into VCD identifiers when writing would help with that.

If something like that is out of scope I would probably be able to implement my own VCD writer instead. That said, alias support would still be welcome, since it'd let me provide an easy improvement to my users, even though it would not provide all of the potential benefit.

jpgrayson commented 4 years ago

being able to convert unique signal indexes (that I already have as a part of the simulator) directly into VCD identifiers when writing would help with [VCD writing overhead]

I'm not following this argument. It's not clear to me how being able to map nMigen's signal indexes into VCD identifiers would help with writing. Assuming you still use VCDWriter.change(), the amount of time it takes to write a value change to the VCD file has little to do with the particular identifier associated with the variable whose value changes. The main thing that can be done to reduce the VCD writing overhead is to minimize the number of value changes (i.e. the number of change() calls).

I took a quick look at how nMidgen is using PyVCD. Two things stand out:

  1. It looks like care is being taken to organize aliased signals and associated VCD vars such that change() is only called on one of set of aliased vars. I think I see how the proposed VCDWriter.register_alias() could fit in here: the first variable associated with a signal would be registered with register_var() and any subsequent variables would be registered with register_alias() such that all variables associated with that signal share the same VCD identifier.

  2. The _VCDWaveformWriter.update() method calls change() for every alias of a signal. The big win for PyVCD supporting alias vars would be that nMidgen could instead just call change() for just one of the set of aliased vars, and since they would share the same identifier, that one change() call would be sufficient to capture the change for all the aliases in the VCD file. Depending on how many aliases are typical for a signal, this could greatly reduce both the CPU overhead of VCD writing and the resulting VCD file size.

Please help me if I'm misunderstanding your goals or if my assessment of the nMidgen code is incorrect.

whitequark commented 4 years ago

I took a quick look at how nMidgen is using PyVCD.

Thanks for taking the time to do it! Your analysis is correct. The idea behind the argument you've (rightly) found unconvincing was that a SignalDict lookup can be expensive, but now that I think about it, the cost of pyvcd accepting indexes directly, and the cost of indexing into an array of pyvcd variables inside nMigen is exactly the same, so there is nothing here that needs an API change in pyvcd.

jpgrayson commented 4 years ago

Great, it sounds like we are on the same page. When I get some time, I will look into implementing VCDWriter.register_alias().

jpgrayson commented 4 years ago

@whitequark I have added VCDWriter.register_alias(). It would be much appreciated if you could take a look and perhaps try using it in nMidgen. If it's suitable, I will cut a PyVCD release with the feature.

whitequark commented 4 years ago

Seems to work well with the following patch:

```diff diff --git a/nmigen/sim/pysim.py b/nmigen/sim/pysim.py index 9dde345..166e2a1 100644 --- a/nmigen/sim/pysim.py +++ b/nmigen/sim/pysim.py @@ -111,23 +111,25 @@ class _VCDWaveformWriter(_WaveformWriter): var_name_suffix = var_name else: var_name_suffix = "{}${}".format(var_name, suffix) - vcd_var = self.vcd_writer.register_var( - scope=var_scope, name=var_name_suffix, - var_type=var_type, size=var_size, init=var_init) + if signal not in self.vcd_vars: + vcd_var = self.vcd_writer.register_var( + scope=var_scope, name=var_name_suffix, + var_type=var_type, size=var_size, init=var_init) + self.vcd_vars[signal] = vcd_var + else: + self.vcd_writer.register_alias( + scope=var_scope, name=var_name_suffix, + var=self.vcd_vars[signal]) break except KeyError: suffix = (suffix or 0) + 1 - if signal not in self.vcd_vars: - self.vcd_vars[signal] = set() - self.vcd_vars[signal].add(vcd_var) - if signal not in self.gtkw_names: self.gtkw_names[signal] = (*var_scope, var_name_suffix) def update(self, timestamp, signal, value): - vcd_vars = self.vcd_vars.get(signal) - if vcd_vars is None: + vcd_var = self.vcd_vars.get(signal) + if vcd_var is None: return vcd_timestamp = self.timestamp_to_vcd(timestamp) @@ -135,8 +137,7 @@ class _VCDWaveformWriter(_WaveformWriter): var_value = self.decode_to_vcd(signal, value) else: var_value = value - for vcd_var in vcd_vars: - self.vcd_writer.change(vcd_var, vcd_timestamp, var_value) + self.vcd_writer.change(vcd_var, vcd_timestamp, var_value) def close(self, timestamp): if self.vcd_writer is not None: ```

On a benchmark I'm using (Minerva SoC) this cut the VCD file size in half and slightly improved simulation speed.

jpgrayson commented 4 years ago

The above patch looks good/as-expected. Thanks for taking the time to integrate.

I will release PyVCD 0.2.2 with this change shortly.

Closing.

whitequark commented 4 years ago

Great, thank you for the amazingly fast response!