Support multiline signal declaration (signal_016)

jeremiah-c-leary / vhdl-style-guide

Style guide enforcement for VHDL

GNU General Public License v3.0

189 stars 38 forks source link

Support multiline signal declaration (signal_016) #539

Closed staerz closed 2 years ago

staerz commented 3 years ago

Is your feature request related to a problem? Please describe. The rule signal_016 requires a signal declaration to be on 1 single line. In case of simple signals (std_logic or alike) this is very reasonable, but for signals that are structures (records), this is not always meaningful and finally conflicts with length_001.

The problem is hence self-consistency.

Describe the solution you'd like The solution would be to extend signal_016 to support multi-line signal declaration, following paranthesis-parsing (like for functions, signal assignments or alike.

Describe alternatives you've considered There is no alternative other than disabling the rule.

Additional context Here is an example of a signal declaration of a more complex type:

  signal avst_packet_o  : t_avst_packet(
    data(G_AVST_DATA3_W - 1 downto 0),
    empty(AVST_EMPTY_W - 1 downto 0),
    error(0 downto 0)
  );

with the underlying type declaration (which poses no problems):

  type t_avst_packet is record
    data  : std_logic_vector;
    valid : std_logic;
    sop   : std_logic;
    eop   : std_logic;
    empty : std_logic_vector;
    error : std_logic_vector;
  end record;

Note that actual data length is intentionally not defined in the type declaration (VHDL-2008) to give flexibility.

jeremiah-c-leary commented 3 years ago

Hey @staerz,

Would the rules for concurrent_009 and concurrent_011 apply here?

Such that signal_016 becomes equivalent to concurrent_011 and then add a new rule to handle the alignment?

staerz commented 3 years ago

Hi @jeremiah-c-leary,

maybe it is not 100% correct to say that these rule would apply there, but I think it is very similar as the concept is the same.

I mean this extended rule for signal declarations is for the case that we have a more complex data type, so it also goes a little along type-012 etc.

Most importantly, this and #549 aught to have the same meaning: the one for signals, the other for variables. And in both cases, I think declaration and assignment should be treated, and at best, very similarly (and most consistently).

jeremiah-c-leary commented 3 years ago

It really seems like signal_016 should be depricated.

I see your point about a type_012 equivalent.

Let me come up with some test cases we can review before I start changing the rule.

staerz commented 3 years ago

Hi @jeremiah-c-leary,

you'd find some examples in the fpga_if.vhd provided here. You'd have to change the variable assignment into a signal assignment.

If you want, I can try to identify other examples in our code base.

jeremiah-c-leary commented 2 years ago

Hey @staerz and @sibeov,

I pushed an update for this issue to the issue-539 branch. I still have some refactoring to do, but I was hoping you would take some time and check out the updates and see if they meet your needs.

Thanks,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

alright, I tested it and it looks good, although I found sth. which I think we haven't considered yet: Upon declaration of a signal, if it is a vector type, one may have a range indication which I find more appropriate to have on the line of the signal declaration itself:

  signal felix_packets : t_avst_packets(minimum(3, G_TILES) - 1 downto 0)
  (
    data(G_FELIX_TXD_W - 1 downto 0),
    empty(G_FELIX_TXE_W - 1 downto 0),
    error(G_FELIX_TXX_W - 1 downto 0)
  );

Instead, what VSG currently (with this patch) want me to do is this:

  signal felix_packets : t_avst_packets
  (
    minimum(3, G_TILES) - 1 downto 0
  )
  (
    data(G_FELIX_TXD_W - 1 downto 0),
    empty(G_FELIX_TXE_W - 1 downto 0),
    error(G_FELIX_TXX_W - 1 downto 0)
  );

Note that I use this configuration:

    "signal_016":{
      "first_paren_new_line":"yes",
      "last_paren_new_line":"yes",
      "open_paren_new_line":"yes",
      "close_paren_new_line":"yes",
      "new_line_after_comma":"ignore_positional",
      "assign_on_single_line":"ignore"
    },

Note that this is perfectly fine for signals where the size of the outer structure is already defined, i.e. has no outer dimension:

  signal xgbe_packet_lolli : t_avst_packet
  (
    data(G_XGBE_TXD_W - 1 downto 0),
    empty(G_XGBE_TXE_W - 1 downto 0),
    error(G_XGBE_TXX_W - 1 downto 0)
  );

Now note that here we just have the cases of: No dimension (VSG handles properly) and 1 dimension (VSG doesn't yet handle it fully), but in general one might have any number of dimensions of that structure. I think now the question is how we would want to handle that. The general syntax for the dimensions would be additional pairs of parenthesis ("(...)"), one for each dimension, before coming to the inner structure of the record (for which VSG is now properly prepared).

I could image 3 possible configurations:

Have the indication of (all) the dimension(s) in a single line, i.e. the line of the "signal" declaration, or
following the current rule of 1 dimension per line, or
(all) the dimension(s) in a new line, i.e. the line after the "signal" declaration, but before the indication of the internal structure.

Maybe that should call in for yet another configuration parameter, what do you think?

Note that this comment actually also applies to constants (see config of constant_016), so I think if we add it here, we add it there as well (and by design would also apply to variables).

In general I think we're a very good step ahead, but thanks to my code base (or due to it, you never know :wink:), we still find room for improvement.

sibeov commented 2 years ago

@jeremiah-c-leary I will take a look as soon as time allows for it! Thanks for the update and notification. This answer is based on not fully reading @staerz s answer. 😅

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I could image 3 possible configurations:

Have the indication of (all) the dimension(s) in a single line, i.e. the line of the "signal" declaration, or

following the current rule of 1 dimension per line, or

(all) the dimension(s) in a new line, i.e. the line after the "signal" declaration, but before the indication of the internal structure.

Maybe that should call in for yet another configuration parameter, what do you think?

What if we had the following two configuration options: first_dimension_new_line and dimension_new_line. The following examples are ignoring alignment as that is handled by another rule. However, I believe the indenting would just follow the existing rules.

If first_dimension_new_line = no and dimension_new_line = no and first_paren_new_line = no the following would be enforced:

  signal felix_packets : t_avst_packets(first_dimension)(second_dimension)(inner_structure);

If first_dimension_new_line = yes and dimension_new_line = no and first_paren_new_line = no the following would be enforced:

  signal felix_packets : t_avst_packets
(first_dimension)(second_dimension)(inner_structure);

If first_dimension_new_line = no and dimension_new_line = yes and first_paren_new_line = no the following would be enforced:

  signal felix_packets : t_avst_packets(first_dimension)
(second_dimension)(inner_structure);

If first_dimension_new_line = no and dimension_new_line = no and first_paren_new_line = yes the following would be enforced:

  signal felix_packets : t_avst_packets(first_dimension)(second_dimension)
(inner_structure);

If first_dimension_new_line = yes and dimension_new_line = no and first_paren_new_line = yes the following would be enforced:

  signal felix_packets : t_avst_packets
(first_dimension)(second_dimension)
(inner_structure);

If first_dimension_new_line = yes and dimension_new_line = yes and first_paren_new_line = yes the following would be enforced:

  signal felix_packets : t_avst_packets
(first_dimension)
(second_dimension)
(inner_structure);

I believe we can address all three of your scenarios with those two configuration options.

Note that this comment actually also applies to constants (see config of constant_016), so I think if we add it here, we add it there as well (and by design would also apply to variables).

I agree, once we get this working it should apply to constant and variable declarations also.

In general I think we're a very good step ahead, but thanks to my code base (or due to it, you never know 😉), we still find room for improvement.

Yes, your code base has given me the opportunity to improve my code base. :grin:

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

alright, that looks good to me, but please allow me to lay down my understanding of your proposal:

first_paren_new_line affects the first parenthesis of the inner structure, i.e. the first opening parenthesis belonging to the last closing one and indicates if that (sort of the) inner structure should or should not be brought to the new line
first_dimension_new_line indicates if the first dimension should or should not be brought to the new line
dimension_new_line indicates if any following dimension(s) should or should not be brought to the new line

That sounds like a good start although I would propose calling the last option other_dimensions_new_line.

Further, I'm wondering how these parameters would affect the parenthesis of the dimensions:

      "first_paren_new_line":"yes",
      "last_paren_new_line":"yes",
      "open_paren_new_line":"yes",
      "close_paren_new_line":"yes",

I understand that by first_dimension_new_line and dimension_new_line(or other_dimensions_new_line) we would force the next opening parenthesis (which belongs to the next element) to the new line, but would we also want a configuration to allow this

  signal felix_packets : t_avst_packets
(first_dimension)
(second_dimension)
(inner_structure);

and that:

  signal felix_packets : t_avst_packets
(
  first_dimension
)
(
  second_dimension
)
(inner_structure);

i.e. to separate the actual indication of the dimension from the parenthesis as it is possible with the first, last, open, close-paren_new_line settings. Not that I'd be in favour of that or I'd ever use it, but just to be complete.

And another thing, just for consistency, we should make sure that things are self-consistent, i.e. in the following code, note that dimensions actually appear on 2 different levels:

  signal felix_packets : t_avst_packets
  (
    minimum(3, G_TILES) - 1 downto 0
  )
  (
    data(G_FELIX_TXD_W - 1 downto 0),
    empty(G_FELIX_TXE_W - 1 downto 0),
    error(G_FELIX_TXX_W - 1 downto 0)
  );

(as reported VSG would want it currently) should (under the premises that we'd want the parenthesis of the dimensions stand alone on single lines) actually be formatted as

  signal felix_packets : t_avst_packets
  (
    minimum(3, G_TILES) - 1 downto 0
  )
  (
    data
    (
      G_FELIX_TXD_W - 1 downto 0
    ),
    empty
    (
      G_FELIX_TXE_W - 1 downto 0
    ),
    error
    (
      G_FELIX_TXX_W - 1 downto 0
    )
  );

as there's a dimension indication within the internal structure!

So I think VSG should be strong enough to apply the rules from the outside to the inside.

Maybe in that respect / from that angle we should review the first, last, open, close-paren_new_line settings - I'll have to study the doc again, but I'd say what counts for opening counts for closing, so it seems a bit redundant here, but I may be off a bit now (by all this dimension thinking...).

So looks like a hard nut to crack, but I'm convinced we'll get there.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

alright, that looks good to me, but please allow me to lay down my understanding of your proposal:

first_paren_new_line affects the first parenthesis of the inner structure, i.e. the first opening parenthesis belonging to the last closing one and indicates if that (sort of the) inner structure should or should not be brought to the new line

first_dimension_new_line indicates if the first dimension should or should not be brought to the new line

dimension_new_line indicates if any following dimension(s) should or should not be brought to the new line

Your understanding is matching my proposal.

That sounds like a good start although I would propose calling the last option other_dimensions_new_line.

I originally had successive_dimensions_new_line, but I like your suggestion.

Further, I'm wondering how these parameters would affect the parenthesis of the dimensions:

I was thinking those parameters would only affect the inner structure. If we wanted to be consistent with the inner structure then I would propose the following parameters:

  "first_dimension_new_line":"no",
  "other_dimension_new_line":"no",
  "dimension_open_paren_new_line":"no",
  "dimension_close_paren_new_line":"no"

And another thing, just for consistency, we should make sure that things are self-consistent, i.e. in the following code, note that dimensions actually appear on 2 different levels:

Are data, empty and error dimensions also?

Just to clarify something. In this example...

  signal felix_packets : t_avst_packets(minimum(3, G_TILES) - 1 downto 0)
  (
    data(G_FELIX_TXD_W - 1 downto 0),
    empty(G_FELIX_TXE_W - 1 downto 0),
    error(G_FELIX_TXX_W - 1 downto 0)
  );

is this the underlying type for t_avst_packets?

  type t_avst_packet is record
    data  : std_logic_vector;
    valid : std_logic;
    sop   : std_logic;
    eop   : std_logic;
    empty : std_logic_vector;
    error : std_logic_vector;
  end record;

I am having trouble understanding how the dimensioning works with t_avst_packets being a record.

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

I was thinking those parameters would only affect the inner structure.

OK, that's good for me, I just wanted to clarify if I fully understood your proposal.

Why would we need dimension_open_paren_new_line and dimension_close_paren_new_line? Would these set if after the opening parenthesis and before the closing parenthesis there's a newline?

I would say that this is too much, but since that setting exists for the inner structure I guess you're right, it should exist as well. One argument against it could be that it should possibly taking the setting from the inner structure, but on the other hand some people (maybe even me) would like to configure dimensions and structure differently (why ever) ... I think I'd have to have a look how it'd look like. (The question is: Would one setting for all be enough and fully consistent or would that look bad?)

Yeah, to complete the example, we have:

  type t_avst_packet is record
    data  : std_logic_vector;
    valid : std_logic;
    sop   : std_logic;
    eop   : std_logic;
    empty : std_logic_vector;
    error : std_logic_vector;
  end record;

  type t_avst_packets is array (natural range <>) of t_avst_packet;

as type definitions (and similar ones) and then as usable code

  signal felix_packets : t_avst_packets(minimum(3, G_TILES) - 1 downto 0)
  (
    data(G_FELIX_TXD_W - 1 downto 0),
    empty(G_FELIX_TXE_W - 1 downto 0),
    error(G_FELIX_TXX_W - 1 downto 0)
  );

such that:

minimum(3, G_TILES) - 1 downto 0 is the dimension of the record array
G_FELIX_TXD_W - 1 downto 0 is the dimension of data
G_FELIX_TXE_W - 1 downto 0 is the dimension of empty
G_FELIX_TXX_W - 1 downto 0 is the dimension of error

Note that this is only possible since VHDL-2008. In previous versions the type t_avst_packet would require to have the dimensions for all its elements defined at type declaration.

So if you consider the range of a std_logic_vector a dimension, then yes, that is a dimension. But note that this is only an example and the record could contain any further substructure (a record could contain another record) and with VHDL-2008, dimensions when ever occurring for any of these structured elements would have to be given upon signal declaration (so they could appear at any level) [if they are not yet defined in the type directly].

I hope that gives you the full picture. Now all we have to do is to put this together :rofl:

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I am thinking it might be helpful to rename the elements to something like this:

signal felix_packets : t_avst_packets(first_outer_dimension)
  (
    (first_inner_dimension),
    (second_inner_dimension),
    (third_inner_dimension)
  );

Where the *_outer_dimension refers to range portion of t_avst_packets...

 type t_avst_packets is array (natural range <>) of t_avst_packet;

...and *_inner_dimension refers to the elements of the t_avst_packet record itself.

  type t_avst_packet is record
    data  : std_logic_vector;
    valid : std_logic;
    sop   : std_logic;
    eop   : std_logic;
    empty : std_logic_vector;
    error : std_logic_vector;
  end record;

So all of the existing parameters:

      "first_paren_new_line": "no",
      "last_paren_new_line": "no",
      "open_paren_new_line": "no",
      "close_paren_new_line": "no",
      "new_line_after_comma": "no",

could be renamed as:

      "inner_dimension_first_paren_new_line": "no",
      "inner_dimension_last_paren_new_line": "no",
      "inner_dimension_open_paren_new_line": "no",
      "inner_dimension_close_paren_new_line": "no",
      "inner_dimension_new_line_after_comma": "no",

...and the parameters for the outer dimensions would be:

 "first_outer_dimension_new_line":"no",
  "other_outer_dimension_new_line":"no",
  "outer_dimension_open_paren_new_line":"no",
  "outer_dimension_close_paren_new_line":"no"

That would allow you to adjust the formatting based on the type of dimension.

Why would we need dimension_open_paren_new_line and dimension_close_paren_new_line? Would these set if after the opening parenthesis and before the closing parenthesis there's a newline?

I am not sure why someone would want to as it looks excessively noisy to me. However, if we wanted to be consistent then we would implement those rules.

The outer_dimension_open_paren_new_line would insert a carriage return after the open parenthesis of a dimension: From this:

signal felix_packets : t_avst_packets(first_outer_dimension)
  (
    (first_inner_dimension),
    (second_inner_dimension),
    (third_inner_dimension)
  );

...to this...

signal felix_packets : t_avst_packets(
  first_outer_dimension)
  (
    (first_inner_dimension),
    (second_inner_dimension),
    (third_inner_dimension)
  );

The outer_dimension_open_paren_new_line would insert a carriage return before the close parenthesis of an outer dimension: From this:

signal felix_packets : t_avst_packets(first_outer_dimension)
  (
    (first_inner_dimension),
    (second_inner_dimension),
    (third_inner_dimension)
  );

...to this...

signal felix_packets : t_avst_packets(first_outer_dimension
  )
  (
    (first_inner_dimension),
    (second_inner_dimension),
    (third_inner_dimension)
  );

But note that this is only an example and the record could contain any further substructure (a record could contain another record) and with VHDL-2008, dimensions when ever occurring for any of these structured elements would have to be given upon signal declaration (so they could appear at any level) [if they are not yet defined in the type directly].

I had not thought of nested records with respect to this rule. Would you happen to have any examples?

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

let's take a step back: Given that structures can be nested in VHDL (like other languages) to any depth in any arbitrary way that is most suitable for the needs of the targeted development, let's not fall into terms of "inner" and "outer" as that gives the impression that only 2 levels would exist.

Instead, what we clearly see in terms of structure is record types that define a container with certain elements of any type and types (which include records) that can have a dimension, i.e. arrays (of records), std_logic_vectors, unsigned, and such.

If I'm not overlooking anything, these are the only 2 categories. So let's find a solution for that such that it fits what ever structural combination that you might find all those elements come together. I mean the VHDL reference is clear about that, so I think should VSG be able to follow that.

If you agree to that above then I think we should put back things on the table (i.e. existing configurations for structures) and see how we can come up with some consistent scheme (and consistent naming for all these configuration parameters) that will cover all these possibilities.

And no, out of my code base I don't have an example of a nested record, I'd have to go steel an example from a collaborator's project, but I guess you can easily imagine one.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I apologize, I was distracted with fixing some errors.

Instead, what we clearly see in terms of structure is record types that define a container with certain elements of any type and types (which include records) that can have a dimension, i.e. arrays (of records), std_logic_vectors, unsigned, and such.

I believe this example matches:

  type t_a is record
     a : std_logic_vector;
     b : std_logic;
  end record;

  type t_a_array is array (natural range <>) of t_a;

  type t_b is record
     x : std_logic;
     y : std_logic_vector;
     z : t_a_array;
  end record

  type t_b_array is array (natural range <>) of t_b;

such that when we create the signal and constrain the std_logic_vectors we could have something like this:

  signal my_sig : t_b_array(3 dowto 0)
  (
     y(7 downto 0),
     z(2 downto 0)
        (
            a(31 downto 0)
         )
  );

Where my_sig is an array of records and each record in the array includes an array of records.

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

no need to apologise, we do things as we can.

Yes, that is a very simple example which summarises our problem to solve. And yes, I would in general set the parenthesis like you.

One aspect we shouldn't forget about is nested arrays for which also two (or more) range indications could follow, and also true multi dimensional arrays, please see this post.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I pushed an intermediate commit to the issue-539 branch. I created a new option array_first_paren_new_line which will insert/remove returns between the array identifier and the open parenthesis in the declaration. It looks like it is working for nested arrays also.

The behavior of first_paren_new_line is changed slightly. It now applies to the first open parenthesis after every array entry.

My test examples use rule_016_test_input.nested_arrays.vhd as the source file. My tests that fix the file based on parameters are using the files of the variation rule_016_test_input.nested_arrays.fixed_array_first_paren_line_line_[yes|no]_first_paren_new_line_[yes|no].vhd and rule_016_test_input.nested_arrays.fixed_array_first_paren_new_line_[true|false].vhd.

I still need to work through the variations with the existing options, however I would appreciate some feedback on the source against the fixed files. Some of the question I have are: 1) Am I heading in the right direction 2) What changes to the source file would you make to improve the robustness of the solution. 3) Are there any missing options

Thanks,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

thanks for coming back on this.

I had a look and would first of all propose to change the name of the configuration parameters.

Given the structure that we have identified, I find it highly confusing to refer to sth. "first_paren" when this first parenthesis might be the one of a range indication or the one of a record construct. I would say that both would/could have different implications of the intended formatting.

One thing during my tests that I realised: Using Sublime as editor, it actually switches syntax highlighting dependent on where the parenthesis is:

  signal felix_packets : t_avst_packets(minimum(3, G_TILES) - 1 downto 0)
  (
    data(G_FELIX_TXD_W - 1 downto 0),
    empty(G_FELIX_TXE_W - 1 downto 0),
    error(G_FELIX_TXX_W - 1 downto 0)
  );

makes the t_avst_packets highlighted while as soon as I separate the following parenthesis to a new line, it becomes regular text.

Now that might just be a flaw of the VHDL highlighting plugin that I use, but it might also give us some good indication of what other people's practice is :wink:

So maybe we could bring in the terms "range", "structure" or "elements" into the configuration names. Here "range" would correspond to what we called "dimension" earlier. It could just be shorter and maybe it would be more self-explanatory since an unconstrained type definition (which then obliges the indication upon signal declaration in the first place) would have natural range <> in these parenthesis.

While in the type definition we have the keyword "record" to enclose the elements, maybe then the term "structure" or "elements" would be a more obvious correlation. One could, of course, go as far as to name the configuration sth. "recordopen..." but I'm not sure if we don't shoot ourselves in the legs with that term: Would there be other structures out in VHDL that are not records but still require a pair of parenthesis and would fall within the scope here? (If the answer is a very solid "no, impossible", then I wouldn't be opposed to just call it "record_something")

Now looking into how you organise the test files, I would suggest to shorten the naming as well: Instead of writing out the full name of the config each time, I think it would be more advisable to just have the actual setting in the name (so "yes" or "no"), but not the name of the setting itself. This does though require you to be rigorous on the order but I think that's easily achieved. So one file could just be called "rule_016_test_input.nested_arrays.fixed_array_yes_yes[_yes...].vhd (with as many "_yes" as configuration parameters there are. Might be 4 or 5 if I don't overlook anything). That would also allow you to really cover everything as your tests would basically be a power of 2 and easily loopable (over all parameters yes/no setting).

I think once we both are clear on the terminology (which is related to the overall structure and goal here), I think we can then go ahead. The next thing could be to just re-discuss which are settings that are needed. It might be that currently there is more than needed, but let's first agree on the structure/terminology and I'm convinced that this would help us to see that (just renaming can sometimes be very eye-opening).

And then we discuss about the implementation :wink:

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I mapped the signal_declaration out based on the LRM to see what is involved: According to the LRM, we are applying constraints to the type. So I think it would make sense to include constraint in the option paramenters. Interestingly any range constraint is actually an array_constraint.

For example in the following code: everything in red boxes would be an array_constriant. Everything in blue boxes are the open and close parenthesis in the record_constraint. Everything in green boxes are commas in the record_constraint. Everything in yellow boxes are record_element_simple_names:

So we could have a set of parameters like this:

rule:
  signal_016:
    array_constraint_single_line: 'yes'
    array_constraint_open_paren_new_line: 'no'
    record_constraint_open_paren_new_line: 'yes'
    record_constraint_close_paren_new_line: 'yes'
    record_constraint_comma_new_line: 'no'
    record_constraint_element_new_line: 'yes'

I believe those options would enforce the following format in the code above:

What do you think?

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

thanks for digging into the LRM, that is really useful!

Note that this example considers a bare record as the outermost structure, but one might also have an array of records. That wouldn't change much w. r. t. the definition of array_constraint and record_constraint, but just to mention in order not to forget it.

So if I stick to your chosen example, we talk about

signal sig8 : record_type_3(1 downto 0)(3 downto 0)
(
  ...
);

So there is potential array_constraints there.

With that regard, I fully subscibe to your analysis of identified elements and possible points of configuration.

Now let's go 1 by 1:

1) array_constraint_single_line: If set to 'yes', would you ensure that all pairs of parentesis that indicate all (potential) dimensions of the array are in 1 single line? I.e.

signal sig8 : record_type_3(1 downto 0)(3 downto 0)

Given that there's a second configuration for array constraints, I wonder what the 4 possible combinations (for this I'd say generally representing example) would look like.

I understand that we talk about the formating of the "red" part only at that moment.

2) array_constraint_open_paren_new_line: I could imagine you have in mind that an option 'yes' could lead to formating like this I.e.

signal sig8 : record_type_3
  (1 downto 0)
  (3 downto 0)

but I wonder in how far that wouldn't contradict array_constraint_single_line: 'yes' as now it's not all in 1 line. Having the 4 combinations laid out would help.

(at that point I'm tempted to say that one configuration option should be enough (which makes either my example 1) or the example 2) the rule. I can barely imagine that anybody would want to strip off the parenthesis from the range and spend extra lines on that. But let me know what you have in mind, I might miss something.)

3) record_constraint_open_paren_new_line: 'yes' would put the opening parenthesis into the new line, while 'no' would put it into the previous line together with the element name? In case of a range indication of the element, it would (in case of 'no') be behind the last closing parenthesis of the range indication, correct? [so this option sets or doesn't set the line break in front of the opening parenthesis?]

Fully OK with that!

4) record_constraint_close_paren_new_line: 'yes', similar to 3), except we control the line break in front of the closing parenthesis.

That's alright.

5) record_constraint_comma_new_line: 'no'. Not sure to fully understand this. I would think it decides if after a comma a line break is inserted, but your example has 'no' but line breaks.

6) record_constraint_element_new_line: 'yes' would insert a new line after the opening parenthesis of a record structure. (elements that follow other elements in a record are already covered by config 5) as they would be separated by comma)

The following picture tries to summarise my understanding of the record constraint settings: signal_016

where I assume all record constrain parameters on, i.e. inserting a new line:

record_constraint_open_paren_new_line: green diamond
record_constraint_close_paren_new_line: red penta-star
record_constraint_comma_new_line: purple hexa-star
record_constraint_element_new_line: orange triangle

I had hoped we could reduce the number of configurations for the record constraints, but by exercising this example I convinced myself of the fact that this is the exact correct amount.

Now for completeness I'd like to better understand your two options on array constraints.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I like how you showed all the options on a single diagram. Let me update it with all the options:

option	symbol	options	actions
record_constraint_open_paren_new_line	green diamond	yes, no, ignore	insert, remove, allow carriage return before open paren
record_constraint_close_paren_new_line	red penta-star	yes, no, ignore	insert, remove, allow carriage return before close paren
record_constraint_comma_new_line	purple hexa-star	yes, no, ignore	insert, remove, allow carriage return before comma
record_constraint_element_new_line	orange triange	yes, no, ignore	insert, remove, allow carriage return before element
array_constraint_single_line	red box	yes, ignore	combine array_constraint into single line, allow array_constraint to span lines
array_constraint_new_line	blue hexgon	yes, no, ignore	insert, remove, allow carriage return before open paren

record_constraint_comma_new_line: 'no'. Not sure to fully understand this. I would think it decides if after a comma a line break is inserted, but your example has 'no' but line breaks.

All of the *_new_line options will insert, remove or allow carriage returns before their respective item. So for the record_constraint_comma_new_line option, the comma can be moved to a new line or moved to the previous line.

record_constraint_element_new_line: 'yes' would insert a new line after the opening parenthesis of a record structure. (elements that follow other elements in a record are already covered by config 5) as they would be separated by comma)

As with the _comma_ option, the record_constraint_element_new_line would handle carriage returns before elements

The array_constraint_single_line option would enforce array_constraints are on a single line. For example, if array_constraint_single_line is yes then

element2(4
 downto
 0
)(
7
 downto
 0
)

would be fixed to:

element2(4 downto 0)(7 downto 0)

If both array_constraint_single_line and array_constraint_new_line (maybe it should be array_contraint_open_paren?) are set to yes then

element2(4
 downto
 0
)(
7
 downto
 0
)

would be fixed to:

element2
(4 downto 0)
(7 downto 0)

If array_constraint_single_line is ignore and array_constraint_new_line is yes then

element2(4
 downto
 0
)(
7
 downto
 0
)

would be fixed to:

element2
(4
 downto
 0
)
(
7
 downto
 0
)

I had hoped we could reduce the number of configurations for the record constraints, but by exercising this example I convinced myself of the fact that this is the exact correct amount.

I agree. I do not believe you should have less than those 6 rules. Annotating the code snippet with those symbols really makes it obvious. We will have to keep that in mind for future issues with multiple options.

If we agree on the option names and their functions, I will first update/create a configuration page and have you review it before I start implementing. I want to say it will not be that hard, but I know better.

Regards,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

almost all good for the record constraint configuration options.

Only for record_constraint_comma_new_line I have the question if you really foresee to make this control the new line to be inserted before the comma. I remind you that the current configuration is called new_line_after_comma, so after the comma.

I cannot imagine any consistent linting where the comma would be put to a new line as a stand-alone character. Just like punctuation in semantics, I would always make that the last character of the sentence (with no space before it).

Now as I type that reply I think maybe there could be a check that there really is no space (in the spots of the purple hexa-star of your picture), but that's it.

Also I noticed that in your picture you consistently place the markers before the items to which the rule corresponds. I had tried to always put them where the actual line break character is (within the precision of a very quick and dirty graphics attempt). Both seem valid.

In my picture though, given that I interpret the rule on the comma as after, I think it makes the application of the following element rule redundant. I write that as I try to avoid to have redundant rules which could possibly result in self-contradicting configuration options.

(So "the record_constraint_element_new_line would handle carriage returns before elements" except after comma (as record_constraint_comma_new_line already treats that))

I would encourage you to consider that comma configuration...

For the array constraints: Do you honestly see any merit in spreading the range indication over multiple lines? I would always argue that (from your example), the "4 downto 0" and "7 downto 0" would always be together. What is a matter of style is where and how the parenthesis go.

Here are the options that I could imagine useful:

element(
  4 downto 0
)(
  7 downto 0
)

element(
  4 downto 0
)
(
  7 downto 0
)

element(4 downto 0)(7 downto 0)

element
  (4 downto 0)
  (7 downto 0)

The latter actually introduces an inconsistency on the parenthesis, but who knows, somebody might still like that style.

So I would actually prefer just one configuration which itself than has possible values that each refer to one of the styles above, plus one "ignore".

We though have to be careful: the actual start and end of the range might come from functions that take arguments and hence introduce another set of parenthesis. I would say depending on which style (of the above) one uses, it might become quickly inconsistent with setting parenthesis for functions. In order to prevent that, I think VSG could give the recommendation to not use the functions in the signal declaration on ranges, but introduce local constants instead and then use the constant in the range indication.

With those 4 "meaningful" options, I would just call the configuration array_constraint_format and give some indicative names for the possible values. These could be:

separated_parenthesis
extra_separated_parenthesis
all_in_one_line
one_line_per_dimension
ignore

I think that would simplify things a lot. And if anybody comes around with yet another idea of typesetting, then we could just add another supported format and that's it.

"We will have to keep that in mind for future issues with multiple options.": It might actually be a good idea to bring a furbished version of that into the documentation of that configuration options then. Just like it helps us to identify and structure the problem, so will it help user to get a catch of what they actually configure here.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

Only for record_constraint_comma_new_line I have the question if you really foresee to make this control the new line to be inserted before the comma. I remind you that the current configuration is called new_line_after_comma, so after the comma.

I cannot imagine any consistent linting where the comma would be put to a new line as a stand-alone character. Just like punctuation in semantics, I would always make that the last character of the sentence (with no space before it).

I cannot imagine either, but I wanted to be able to fix a situation like this:

  element1(7 downto 0)
  , element2(7 downto 0)

which I would assume was not intentional. I could fix it silently, but if I have to put the logic in to move the comma then I might as well add the ability to configure it. It would also advertise to users the issue will be detected and fixed.

I remind you that the current configuration is called new_line_after_comma, so after the comma.

I think after working through the LRM, having the example code, and not referencing the options I had previously I ended up treating all the new lines as before instead of after. I also dropped the first_ and last_ options.

In order to prevent that, I think VSG could give the recommendation to not use the functions in the signal declaration on ranges, but introduce local constants instead and then use the constant in the range indication.

I had been thinking of adding a rule to check for ranges that are not range types to help the code express itself better.

   element1(7 downto 0),
   element2(15 downto 0)

versus

  element1(byte),
  element2(word)

You could also abstract away range calculations in a constant as you pointed out.

With those 4 "meaningful" options, I would just call the configuration array_constraint_format and give some indicative names for the possible values.

I like your proposal. I would want to keep the number of options small to start out and let users put in an issue if they want a different formatting option.

I do wonder about the one_line_per_dimension option. I was thinking the array_constraint_format would work against a single array constraint. However, if you have multiple array constraints in a row it would require a separate option. Probably not ideal. I will have to think about it.

"We will have to keep that in mind for future issues with multiple options.": It might actually be a good idea to bring a furbished version of that into the documentation of that configuration options then. Just like it helps us to identify and structure the problem, so will it help user to get a catch of what they actually configure here.

I agree. It would also be useful to update other configuration documentation.

Regards,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

On record_constraint_comma_new_line:

if I have to put the logic in to move the comma then I might as well add the ability to configure it.

OK, so we both identify the need to in some way define where to put that comma: As I already pointed out, I'd actually like a rule that puts it directly after the range indicating closing parenthesis, on the same line. That is very similar to your attempt of trying to catch it if was moved to the next line.

Could we hence re-name and re-define that rule to define the location of that comma simply? It would then have more than just 2 options:

ignore: doesn't care for it and doesn't touch anything
space: leaves one space between it and the closing parenthesis, so requiring ") ,"
no_space: leaves no space between it, also no line break, so directly requiring "),"
new_line: for the crazy people who actually want to put it on a new line (indentation would then still need to be defined but that should just be at the location where the first character of the element name starts I suggest)

And if we renamed it to record_constraint_comma?

This should have that story covered then. And I agree that this would then be complementary to the record_constraint_element_new_line configuration.

Note that maybe dropping the _new_line from the other configuration names and moving it to the value would be a good idea, too. That would make in particularly the value no turn into sth. like same_line, yes into new_line and be hence more descriptive and self-explanatory to a user in general, what do you think?

I like your proposal. I would want to keep the number of options small to start out and let users put in an issue if they want a different formatting option.

:100: what I'm saying.

I do wonder about the one_line_per_dimension option. I was thinking the array_constraint_format would work against a single array constraint.

I'd say no. It's not working against it, it should be tolerant to any number of dimensions. The 4 options I laid out would work also with just 1 dimension, or 3 or more (or any other number):

For 1 dimension:

"separated_parenthesis"
```
element(
4 downto 0
)
```
"extra-separated_parenthesis"
```
element(
4 downto 0
)
```
doesn't change anything as that would just have an obvious effect for sequences of ")(" which we don't see here for the lack of extra dimensions
"all_in_one_line"
```
element(4 downto 0)
```
"one_line_per_dimension"
```
element
(4 downto 0)
```

For 3 dimensions:

"separated_parenthesis"

element(
4 downto 0
)(
7 downto 0
)(
9 downto 0
)

"extra-separated_parenthesis"
```
element(
4 downto 0
)
(
7 downto 0
)
(
9 downto 0
)
```
doesn't change anything as that would just kick in for sequences of ")("

"all_in_one_line"

element(4 downto 0)(7 downto 0)(9 downto 0)

"one_line_per_dimension"

element
(4 downto 0)
(7 downto 0)
(9 downto 0)

So it's really just the structure that we'd define here, and the dimensions just fill in ...

This would be to throw 1 configuration at it.

If we wanted to do similar to how we do for record constraints, then we'd have to introduce 4 configurations: each before and after of opening and closing parenthesis. So that would require 16 tests if you want to walk yourself through that pain :wink: I'd argue that the above 4 is already quite reasonable.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

Could we hence re-name and re-define that rule to define the location of that comma simply? It would then have more than just 2 options:

ignore: doesn't care for it and doesn't touch anything

space: leaves one space between it and the closing parenthesis, so requiring ") ,"

no_space: leaves no space between it, also no line break, so directly requiring "),"

new_line: for the crazy people who actually want to put it on a new line (indentation would then still need to be defined but > that should just be at the location where the first character of the element name starts I suggest)

And if we renamed it to record_constraint_comma?

I like this approach. I would implement the ignore and no_space options. The space option is handle by another rule which removes spaces before a comma. The last one I would implement if someone asks for it. No use encouraging that style.

Note that maybe dropping the _new_line from the other configuration names and moving it to the value would be a good idea, too. That would make in particularly the value no turn into sth. like same_line, yes into new_line and be hence more descriptive and self-explanatory to a user in general, what do you think?

I this idea also. It gives a lot more flexibility in defining behaviors. So the options would be:

option	symbol	options	actions
record_constraint_open_paren	green diamond	new_line, previous_line, ignore	insert, remove, allow carriage return before open paren
record_constraint_close_paren	red penta-star	new_line, previous_line, ignore	insert, remove, allow carriage return before close paren
record_constraint_comma	purple hexa-star	previous_line, ignore	remove, allow carriage return before comma
record_constraint_element	orange triange	new_line, previous_line, ignore	insert, remove, allow carriage return before element
array_constraint	red box	all_in_one_line, one_line_per_dimension, ignore	combine array_constraint into single line, put each dimension on it's own line, allow array_constraint to span lines

Not so sure on the previous_line wording. new_line makes sense though. Any suggestions on a better phrase?

So that would require 16 tests if you want to walk yourself through that pain 😉 I'd argue that the above 4 is already quite reasonable.

I can see the advantages with testing.

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

that looks like we're making good progress here, I like how things start coming together now!

Very agreed for record_constraint_comma!

No use encouraging that style.

:100:

For the wording of "previous_line", would the option name "attached" be better? Like "attached to the previous element [in the previous line]"

And for the action, I think "allow carriage return before open paren" should simply say "don't care". I.e. that would also include that if there'd be a space (or multiple spaces), VSG would simply ignore it (or other rules as you say might kick in for a few details here...). Let's say that this is at least what I'd expect to happen as a user if I have an "ignore" option.

For the array_constraint though I see you dropped the options "separated_parenthesis" and "extra-separated_parenthesis". Not sure if I would like to use one of these actually, possibly not as "one_line_per_dimension" looks good to me. So would that be sth. that you'd say "could be implemented upon user demand"? I'd be totally fine with that.

Also here for the wording of the action, I think "allow array_constraint to span lines" should just read "don't care" with the implication that VSG really doesn't care - and simply leave it to the record constraint setting to format from the character after the closing parenthesis onward. (I'm happy we made this picture, it's super useful! :+1: )

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I'm happy we made this picture, it's super useful!

Completely agree. Makes we wonder why it took so long to come up with. I'm sure it would have made some of our earlier conversations easier. It provides so much clarity.

For the array_constraint though I see you dropped the options "separated_parenthesis" and "extra-separated_parenthesis". Not sure if I would like to use one of these actually, possibly not as "one_line_per_dimension" looks good to me. So would that be sth. that you'd say "could be implemented upon user demand"? I'd be totally fine with that.

Yeah, I figured I would keep the variations to a minimum for now. Nothing worse than writing and testing code that nobody uses.

As for the previous_line question, what if I just make the options more descriptive.

option	symbol	options
record_constraint_open_paren	green diamond	add_carriage_return_before, remove_carriage_return_before, ignore_carriage_return_before
record_constraint_close_paren	red penta-star	add_carriage_return_before, remove_carraige_return_before, ignore_carriage_return_before
record_constraint_comma	purple hexa-star	remove_carriage_return_before, ignore_carriage_return_before
record_constraint_element	orange triange	add_carriage_return_before, remove_carriage_return _before, ignore_carriage_return_before
array_constraint	grey box	all_in_one_line, one_line_per_dimension, ignore_formatting

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

I would not overdo it, i.e. not have the options be sentence-like values. That just increases the chance for adding a typo and having users wonder why it wouldn't work as they'd expect.

We won't get around adding good documentation, i.e. a description for each option. To avoid repeating what the option name said, I'd rather keep it short, so I liked what we had before, just with "previous_line" being called "attached" instead.

You keep the notion of just focusing on the new line though in your ignore option which I think is a mistake. I would say, just like one can disable rules, one should be able to disable certain configuration options. Disabling rules is done by setting "disable" to "true" and disabling configuration options should just be done by "ignore". (Same value name for all configurations no matter which rule!)

So that would also apply to the array_constraint option: Make it ignore and not ignore_formatting (everything VSG is looking out for is formatting, so that's in it by definition).

jeremiah-c-leary commented 2 years ago

Hey @staerz,

You keep the notion of just focusing on the new line though in your ignore option which I think is a mistake. I would say, just like one can disable rules, one should be able to disable certain configuration options. Disabling rules is done by setting "disable" to "true" and disabling configuration options should just be done by "ignore". (Same value name for all configurations no matter which rule!)

I agree. Seeing an undocumented rule I have been "following" stated clearly really helps. Thanks.

I would not overdo it, i.e. not have the options be sentence-like values. That just increases the chance for adding a typo and having users wonder why it wouldn't work as they'd expect.

We won't get around adding good documentation, i.e. a description for each option. To avoid repeating what the option name said, I'd rather keep it short, so I liked what we had before, just with "previous_line" being called "attached" instead.

That makes sense. I cannot fully convey what the option does in the option name itself. I will need documentation with examples to really describe what the options and their possible values accomplish.

option	symbol	values	actions
record_constraint_open_paren	green diamond	add_new_line, remove_new_line, ignore	insert, remove, allow carriage return before open paren
record_constraint_close_paren	red penta-star	add_new_line, remove_new_line, ignore	insert, remove, allow carriage return before close paren
record_constraint_comma	purple hexa-star	remove_new_line, ignore	remove, allow carriage return before comma
record_constraint_element	orange triange	add_new_line, remove_new_line, ignore	insert, remove, allow carriage return before element
array_constraint	grey box	all_in_one_line, one_line_per_dimension, ignore	combine array_constraint into single line, put each dimension on it's own line, allow array_constraint to span lines

Why do I feel the need to have new_line somewhere in the values? I suppose the value could be something more complicated than just adding or removing carriage returns. For example the all_in_one_line vs one_line_per_dimension.

What do you think of the above table?

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

I'm generally happy with the table, in particular I'm happy with columns option, symbol and values.

For actions I would though recommend to separate the explanations for the "add_new_line/remove_new_line" setting vs. the "ignore": I think my previous comment on it wasn't too clear:

I hope that our understanding is the following:

The setting "add_new_line" enforces a carriage return (alias "new line") [and, consequently by indentation rules kicking in, also (indirectly) enforces the indentation of the new line]
The setting "remove_new_line" enforces the removal of any potential space and carriage return
The setting "ignore" disables the option and hence no formatting check is done at all: spaces and new lines can be anything

It is mainly on the last point: Also spaces are ignored.

Given that all 4 record constraints use the same 3 values (comma is the exception), it would possibly be not such a bad idea to split the option column into one column that identifies the "defining structural element" (which would be: "opening parenthesis", "closing parenthesis", "comma", "new element") and the actual "description".

It could look like this:

option	symbol	values	structural element	description
record_constraint_open_paren	green diamond	add_new_line, remove_new_line, ignore	opening parenthesis	The setting "add_new_line" enforces a carriage return (alias "new line") [and, consequently by indentation rules kicking in, also (indirectly) enforces the indentation of the new line] The setting "remove_new_line" enforces the removal of any potential space and carriage return The setting "ignore" disables the option and hence no formatting check is done at all: spaces and new lines can be anything
record_constraint_close_paren	red penta-star	add_new_line, remove_new_line, ignore	closing parenthesis
record_constraint_comma	purple hexa-star	remove_new_line, ignore	comma
record_constraint_element	orange triangle	add_new_line, remove_new_line, ignore	new element
array_constraint	grey box	all_in_one_line, one_line_per_dimension, ignore	array range indication	combine array_constraint into single line, put each dimension on it's own line, allow array_constraint to span lines

(and here's the HTML for that as markdown doesn't support the row span:


<table>
    <thead>
        <tr>
            <th>option</th>
            <th>symbol</th>
            <th>values</th>
            <th>structural element</th>
            <th>description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
          <td>record_constraint_open_paren</td>
          <td>green diamond</td>
          <td>add_new_line, remove_new_line, ignore</td>
          <td>opening parenthesis</td>
          <td rowspan=4>
<ul>
<li>The setting "add_new_line" enforces a carriage return (alias "new line") [and, consequently by indentation rules kicking in, also (indirectly) enforces the indentation of the new line]</li>
<li>The setting "remove_new_line" enforces the removal of any potential space and carriage return</li>
<li>The setting "ignore" disables the option and hence no formatting check is done at all: spaces and new lines can be anything</li>
</ul>
          </td>
        </tr>
        <tr>
          <td>record_constraint_close_paren</td>
          <td>red penta-star</td>
          <td>add_new_line, remove_new_line, ignore</td>
          <td>closing parenthesis</td>
        </tr>
        <tr>
          <td>record_constraint_comma</td>
          <td>purple hexa-star</td>
          <td>remove_new_line, ignore</td>
          <td>comma</td>
        </tr>
        <tr>
          <td>record_constraint_element</td>
          <td>orange triangle</td>
          <td>add_new_line, remove_new_line, ignore</td>
          <td>new element</td>
        </tr>
        <tr>
          <td>array_constraint</td>
          <td>grey box</td>
          <td>all_in_one_line, one_line_per_dimension, ignore</td>
          <td>array range indication</td>
          <td>combine array_constraint into single line, put each dimension on it's own line, allow array_constraint to span lines</td>
        </tr>
    </tbody>
</table>

)

So that would be what I'd put.

Now from the row symbol I get that the documentation would also contain the picture. Here I'd only recommend to remove the white space between the array element and the range indication. It looks a bit odd with so much space. (A little space just to nicely put the grey box in without squeezing is fine.)

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I hope that our understanding is the following:

The setting "add_new_line" enforces a carriage return (alias "new line") [and, consequently by indentation rules kicking in, also (indirectly) enforces the indentation of the new line]

The setting "remove_new_line" enforces the removal of any potential space and carriage return

The setting "ignore" disables the option and hence no formatting check is done at all: spaces and new lines can be anything

It is mainly on the last point: Also spaces are ignored.

That matches with my understanding.

Given that all 4 record constraints use the same 3 values (comma is the exception), it would possibly be not such a bad idea to split the option column into one column that identifies the "defining structural element" (which would be: "opening parenthesis", "closing parenthesis", "comma", "new element") and the actual "description".

We can add the structural element to the table. I do like the way it looks with the spanned rows for the description. It works because the options for each row do the same thing, just on different structural elements. I will need to see how to span rows in restructured text.

Now from the row symbol I get that the documentation would also contain the picture. Here I'd only recommend to remove the white space between the array element and the range indication. It looks a bit odd with so much space. (A little space just to nicely put the grey box in without squeezing is fine.)

I agree. I will recreate the picture without the spaces and move the grey boxes accordingly.

Well, it seems like we have an agreement on the options and what they do. Here is my plan going forward:

1) Document the options for the rule in VSGs sphinx documentation 2) Review the documentation with you to get concurrence and update as necessary 3) Begin implementing the options.

Regards,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

in case the table with spanning rows poses a problem in structured text we might also not use a table and rather use an approach with sections with a reference to a common description.

But yes, fully agreed, good documentation is sth. that is largely underestimated, so let's see that me make a good job here as well.

Alright, so let me know when ever you have sth. that you'd want my feedback on!

I'm really happy to see us get here so far and I hope and think that addressing this issue would be a major improvement to VSG!

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I toke a first attempt at adding documentation for this at: https://github.com/jeremiah-c-leary/vhdl-style-guide/blob/issue-539/docs/configuring_multiline_constraint_rules.rst

I think it needs some work, but the basic structure is there.

Let me know what you think and if there is anything I should change.

Thanks,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

looks good, I found a few minor details:

"closing_parenthesis" in Structural Element shouldn't have the underscore
in Values, add the commas between the names of the different values; alternatively we could make that an unnumbered list
"The following code illustrates", I'd rather phrase it "The following figure illustrates" (it's more than just code)
the indentation in the actual code examples for the elements is not yet correct: We should possibly number the elements of the "outer" dimension and give letters to the inner ones such that the outer ones go from 1 to 7 and the inner from A to F (we should possibly do that in the figure as well) [ I just looked at the first one for now: the closing parenthesis should be at the same indent a the element itself, shouldn't it? ]
since you add the symbols as pictures in the table (which I think is very good), maybe make them true transparent pngs such when the table highlighting kicks in, we won't see a white box around the symbol but actually the table background colour (of that row)

I'd check the other examples once the above has made it's way into the doc if that's fine with you (taking a step at a time...)

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I believe I addressed your comments above.

Thanks,

--Jeremy

staerz commented 2 years ago

HI @jeremiah-c-leary,

thanks. I think the iteration using letters in the 2nd dimension is not bad at all :smile:

Yes, now I agree to the indentation of the 1st example.

Looking into the 2nd example I noticed that even the figure that we use has an inconsistency in the indentation!

The opening parenthesis for sig8 itself is on char 1 (so with the first char of signal)
The opening parenthesis for element2 is indented and not aligned with the first char of element2

Now I think that would be subject of indentation rules, but I'd say that only 1 level of indentation per level of structure should be done, hence pushing the opening parenthesis back to the left.

The third example doesn't look right yet to me, shouldn't elementA and elementB now be on the same line?

So far for now!

jeremiah-c-leary commented 2 years ago

Hey @staerz,

Looking into the 2nd example I noticed that even the figure that we use has an inconsistency in the indentation!

Would you agree the indentation of this figure:

The third example doesn't look right yet to me, shouldn't elementA and elementB now be on the same line?

Yeah, there were a lot of issues with the format. I have updated it.

Let me know what you think of the updates.

Thanks,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

yes, I agree to the figure (I see you left intentional space before the comma in order to inject the symbol).

I also agree to the formatting of the 3rd example "record_constraint_element set to 'remove_new_line'".

Note that still the indentation of the 2nd example needs to shift back the 2nd level indentation (for the opening of the parenthesis and element-letter).

And one remark for all: Maybe it would look better if you put the keyword (option name and value names) into monospace font (in headlines and in descriptive text) of the examples ... and in the table as well for consistency.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

I updated the figure and the indenting in the 2nd example.

I also monospaced the option name and values. I am not so sure on the results though.

Let me know what you think.

Thanks,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

thanks!

Yeah, I can understand your hesitation towards having the option names and values in monospace but from my experience it's a matter of getting the habit. It's nothing that I'd count indispensable, so if you really don't like it, revert it. [note that for consistency, the entire documentation should then possibly be reworked in that aspect and I'm not sure if that's worth doing ... esp. if you don't like it! ... so consider it an idea, but not all my ideas are unbeatable]

The 2nd example (record_constraint_close_paren) looks good now.

Re-looking at the 3rd example (record_constraint_element): You seem to have changed the indent from 2 spaces to 4 spaces. I would avoid that and also use 2 spaces here for the sake of leaving the confusion of the user to a minimum. But parenthesis and line breaks look good to me.

Coming to the 4th example (array_constraint set to one_line_per_dimension):

I'd say here we have to agree on what the indent should be: I always argue for 1 level per level, but in addition we have the array constraint that also needs some visible indent. I would suggest to simply de-indent (back to the level of the element, the 2nd level opening parenthesis, so from this

signal sig8 : record_type_3
(
  element1
    (7 downto 0),
  element2
    (4 downto 0)
    (7 downto 0)
    (
      elementA

to this

signal sig8 : record_type_3
(
  element1
    (7 downto 0),
  element2
    (4 downto 0)
    (7 downto 0)
  (
    elementA

(the difference is in the last 2 lines only)

So signal is at "0-indent", element-number is at "1-indent", element-letter is at "2-indent".

Clearly the opening parenthesis for the array constraint would be 1 indent more than the opening parenthesis for the record but that assures that the array constraint is clearly visible and the difference to the structural parenthesis is clear as well.

Let me know what you think.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

Yeah, I can understand your hesitation towards having the option names and values in monospace but from my experience it's a matter of getting the habit. It's nothing that I'd count indispensable, so if you really don't like it, revert it. [note that for consistency, the entire documentation should then possibly be reworked in that aspect and I'm not sure if that's worth doing ... esp. if you don't like it! ... so consider it an idea, but not all my ideas are unbeatable]

I think it is starting to grow on me. It seemed odd to use monospace in the titles of the examples. Looking at it again, I can see how using monospace makes those items pop out in the text.

Re-looking at the 3rd example (record_constraint_element): You seem to have changed the indent from 2 spaces to 4 spaces. I would avoid that and also use 2 spaces here for the sake of leaving the confusion of the user to a minimum. But parenthesis and line breaks look good to me.

Yes, I see that now. I adjusted it to 2 spaces.

Clearly the opening parenthesis for the array constraint would be 1 indent more than the opening parenthesis for the record but that assures that the array constraint is clearly visible and the difference to the structural parenthesis is clear as well.

I see your point on indenting the dimensions but not the record constraint. I'm sure we can make it configurable for those who would like a different indent.

I pushed an update with the indent changes to the third and fourth example.

I think we could be done with the documentation updates.

Let me know what you think.

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

I've looked over it again and confirm that now the 4th example looks good to me as well.

Yeah, as you mention, indentation of the parenthesis is another thing but I would actually not foresee a configuration for that, it would only make it inconsistent. The only useful case I'd see is if the array constraint on a new line should be indented one level more than the element it belongs to or not (and hence be aligned with the opening parenthesis of the content to follow.

Other than that we're done documenting, with the occasional inconsistency which now still exists regarding the monospace of the values in the table in the values and description column. Fix that or don't, depending how much monospace for these appeals to you (and yes, making it more stick out from the rest is the clear benefit. Now if it has light grey background or not is really up to the style sheet and not much of a concern, at least not to me).

jeremiah-c-leary commented 2 years ago

Hey @staerz,

Just wanted to give you an update.

It looks like I am able to parse and classify our example code correctly which is the precursor to creating the options we defined. I still need to convince myself that it is working though. The recursion through my classifier hurts my head sometimes.

I also need to see if I can break my code with more complex examples.

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

thanks for the heads-up! Yeah, I can imagine your headache. Unfortunately I don't have any more examples in my code that'd get more complicated than what I posted in the issue description. If it's of any help I could though come up with sth. crazy random for your testing.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

The issue I am currently facing is knowing when I have run into an array_constraint or a record_constraint.

I used this pattern to detect a record_element: [open_paren, token, open_paren]. Which works well for simple range definitions like:

(element1(7 downto 0), element2(7 downto 0))

However, I started chasing down the array_constraint and I eventually run into the range definition, which breaks out into this:

If you follow the path from range you go through term, then factor and come to primary. Here is the definition of primary:

primary ::=
    name
  | literal
  | aggregate
  | function_call
  | qualified_expression
  | type_conversion
  | allocator
  | ( expression )

name would not be an issue as the pattern would be [open_paren, name, 'downto'] literal would not be an issue as the pattern would be [open_paren, literal, 'downto'] aggregate would not be an issue as the pattern would be [open_paren, open_paren] qualified_expression would not be an issue as the pattern would be [open_paren, token, tick] allocator would not be an issue as the pattern would be [open_paren, new] ( expression ) would not be an issue as the pattern would be [open_paren, open_paren]

However, both functional_call and type_conversion would be a problem. Their patterns would be [open_paren, token, open_paren]. This happens to match my record_constraint pattern check.

One way to resolve this would be to keep track of function and type names as I process a file. Unfortunately, these can be placed into a package and VSG only processes a single file at a time.

At the moment I am stuck, but I will be thinking of some solution to this. If you have any ideas, please let me know.

Regards,

--Jeremy

jeremiah-c-leary commented 2 years ago

I suppose one option is to have a rule that is very limited to how it is applied:

For example I just ran my classifier on the following code:

signal sig9 : record_type_3(func1(A, 5, 6) downto func2(4, 6, 8));

and it was classified as:

signal         <class 'vsg.token.signal_declaration.signal_keyword'>
sig9           <class 'vsg.token.signal_declaration.identifier'>
:              <class 'vsg.token.signal_declaration.colon'>
record_type_3  <class 'vsg.token.subtype_indication.type_mark'>
(              <class 'vsg.token.record_constraint.open_parenthesis'>
func1          <class 'vsg.token.record_element_constraint.record_element_simple_name'>
(              <class 'vsg.token.index_constraint.open_parenthesis'>
A              <class 'vsg.parser.todo'>
,              <class 'vsg.token.index_constraint.comma'>
5              <class 'vsg.parser.todo'>
,              <class 'vsg.token.index_constraint.comma'>
6              <class 'vsg.parser.todo'>
)              <class 'vsg.token.index_constraint.close_parenthesis'>
downto         <class 'vsg.token.record_element_constraint.record_element_simple_name'>
func2          <class 'vsg.token.record_element_constraint.record_element_simple_name'>
(              <class 'vsg.token.index_constraint.open_parenthesis'>
4              <class 'vsg.parser.todo'>
,              <class 'vsg.token.index_constraint.comma'>
6              <class 'vsg.parser.todo'>
,              <class 'vsg.token.index_constraint.comma'>
8              <class 'vsg.parser.todo'>
)              <class 'vsg.token.index_constraint.close_parenthesis'>
)              <class 'vsg.token.record_constraint.close_parenthesis'>
;              <class 'vsg.token.signal_declaration.semicolon'>

Everything is fine until the first open parenthesis. It is classified as a record_constraint, then everything else afterwards is misclassified until the semicolon.

Hmmm....I wonder if I could use the downto or to in the range definition as part of my check.

range ::=
    range_attribute_name
  | simple_expression direction simple_expression

So if I find a direction at the same "level" as the open parenthesis then I know I am in an array_constraint.

This could work. I will think about it some more.

--Jeremy

jeremiah-c-leary commented 2 years ago

Well, things were going well. I was able to detect the direction and then classify accordingly. However, I ran down the range_attribute_name:

and one of the options for prefix is a functional_call. So now I am back to needing to know if a token is a function name.

I will need to think on this one...

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

thanks for poking thoughts around here. I see it's not trivial at all!

Just looking from a possibly more external view, I'd say the structure defining a record constraint which one would always find is the comma surrounded by parenthesis (but making sure not to confuse it with any other internal structure). It could then be multiple commas actually.

For the detection of the direction I'd say that this might be even more difficult, note that one might also use the 'range attribute. In that case one wouldn't see a to or downto to identify a range. (I think the combination of finding one of the 3 would be covering it though, I cannot think of a 4th way identifying a range.)

jeremiah-c-leary commented 2 years ago

Hey @staerz,

Just looking from a possibly more external view, I'd say the structure defining a record constraint which one would always find is the comma surrounded by parenthesis (but making sure not to confuse it with any other internal structure). It could then be multiple commas actually.

Are you suggesting using a comma to determine a record_constraint. Like this:

element1(elementA(7 downto 0), elementB(7 downto 0))

For the detection of the direction I'd say that this might be even more difficult, note that one might also use the 'range attribute. In that case one wouldn't see a to or downto to identify a range. (I think the combination of finding one of the 3 would be covering it though, I cannot think of a 4th way identifying a range.)

I could detect the 'range and then know I have an array_constraint.

I find the fact you can have a function return a range to be intriguing. Under what circumstances would you do such a thing?

The core issue is I can't tell the difference between these:

  record_element   ::=  ( record_element_simple_name ( ...
  array_constraint ::=  ( function_name ( ...
  array_constraint ::=  ( type_mark ( ...

VSG works on a single file at a time without any knowledge of other files. This has some benefits:

1) The code just needs to be syntactically correct 2) Code does not need to compile 3) Can run in an editor easier

...and some drawbacks:

1) Can not fully parse and classify code (current example) 2) Consistent case rules can not always be applied

It seems the only way for VSG to overcome this limitation is to feed it enough files for it to pull all the function names and type names. There would have to be a configuration somewhere that pulls in all the files. That would give us the ability to handle more rules. However, it would also increase the time it takes to run VSG on a single file. Could you image if you passed it all the files in your project (lets say 100) just to have it run on a single file? It might just become unusable at that point.

I'm thinking maybe I could save some key information, like packages and the types in them, in an object and then pickle it for later use. VSG would then pull in the stored object with all the references which it could then apply to the single file. VSG would also need to know if the stored information is out of date so it could re-parse it to ensure the information is up to date.

Another option would be to allow for the current parsing method, which just ignores the issue, and also implement a "real" parsing method if the function_name and type_mark exist. I suppose I would also need to keep track of record_element_simple_names in order to ensure I have a record element.

Side thought...I suppose you can't have a record_element_simple_name, function_name, and type_mark be the same value.

I was thinking of making taking VSG down this path, but now I am not so sure.

Maybe there are just inherent limitations with VSG, and maybe that is okay.

I would like it if it solved 100% of the formatting problems, but is 95% good enough? Is 90% good enough?

Do I take VSG down the path where the code has to be compilable?

It is nice to be working in my editor and just completely ignoring style. Then I hit a macro key and poof, the code is formatted, regardless of whether it will compile.

Maybe VSG could operate differently on files depending on the amount of information it has available? Maybe some rules are only executed when there is enough information?

I really do not know.

Any opinions?

Regards,

--Jeremy

staerz commented 2 years ago

Hi @jeremiah-c-leary,

Are you suggesting using a comma to determine a record_constraint. Like this:

Yes. To my understanding it is the comma that sanely identifies a record as a record (without knowing its definition). There is one exception though: If the record only had one defined element.

But then the rule to put a new line after the comma or now (which is what we are after here primarily) doesn't apply at all and I honestly think that this is not going to appear in any code (I don't see why one would define a record with just one element).

I find the fact you can have a function return a range to be intriguing.

I think you misunderstood my point. VHDL doesn't have a type for a range, a range can hence never be retrieved from a function or in any other likewise way. What VHDL allows though is the 'range attribute which any ranged element would have and that is sth. that I just wanted to point you to on your mission to identify a range: It could be either by the to, downto or 'range keyword. Note that the 'range keyword would though be following another signal's or constant's name.

Let me give you an example that I'd have in mind:

  constant take_my_range: std_logic_vector(4 downto 2);

  signal avst_packet_o  : t_avst_packet(
    data(take_my_range'range),
    empty(AVST_EMPTY_W - 1 downto 0),
    error(0 downto 0)
  );

The above is perfectly valid VHDL code (and I think I'm using it here and there (at least the concept of picking the range from a constant or another signal), so VSG should be supporting that :wink:

The core issue is I can't tell the difference between these:

I'd say forget about identifying this purely based on positional relations of parenthesis. Identifying related pairs of parenthesis and excluding it from the list of candidates so to say would rather be useful, but that's it. As said before, the record element is identified by the comma while the array constraint is identified by a true range (defined via to, downto or 'range.

It seems the only way for VSG to overcome this limitation is to feed it enough files for it to pull all the function names and type names.

Clear no, forget about that. This will not work unless the entire code base is passed to VSG and that is rarely the case. In particular that would not work in my use case where only fractions of the overall code are passed to VSG independent of each other. VSG must be able to parse a file on its own without knowledge of anything else.

Identifying structure based on parsing just one VHDL file is the way to go.

(If you went any further than that you'd basically do the first job of a synthesis tool that tries to make sense of the code and put things together (instantiations, function definitions and function calls, etc.). If you ask me, no, that's not what we should aim for.)

I suppose you can't have a record_element_simple_name, function_name, and type_mark be the same value.

I'm not sure but at least one can overload functions which would make its signature different (more or less commas, different types of parameters) so that's a dead end.

Do I take VSG down the path where the code has to be compilable?

Strong NO. Again, that would be the job of a synthesiser and VSG may not have access to all required code for that. (It will not in my case)

Then I hit a macro key and poof, the code is formatted, regardless of whether it will compile.

That's a strong yes. If it compiles or not doesn't matter, you (usually) need more than one file in order to answer that question and it's the synthesiser which does that.

Any opinions?

So I think that's a lot of opinion and I hope that it was useful in any way.

jeremiah-c-leary commented 2 years ago

Hey @staerz,

Yes. To my understanding it is the comma that sanely identifies a record as a record (without knowing its definition). There is one exception though: If the record only had one defined element.

Unfortunately, I do not believe we can use the comma as both the array_constraint and record_constraint can have a comma.

As said before, the record element is identified by the comma while the array constraint is identified by a true range (defined via to, downto or 'range.

I think you are close. We can not use the comma, but I do believe we can use downto, to and ':

So essentially, when I check for an array_constraint or an element_constraint, I search for downto, to and a ' at the same "level" as the open paren. I will then follow these rules:

1) If I find any of them before I reach the matching closing parenthesis, then I have found an array_constraint. 2) If I do not find any of them before I reach the matching closing parenthesis, then I have found an element_constraint.

So I think that's a lot of opinion and I hope that it was useful in any way.

I did not really think through the ramifications of forcing code to be compilable until now. I believe it reduces the usability for only a small gain in enforcing consistent case rules across multiple files.

Thanks for your feedback. It provided the clarity I needed.

I am going to add checking for the ' for the array_constraint and check some more test cases to see if there is anything else lurking around.

Regards,

--Jeremy