jeremiah-c-leary / vhdl-style-guide

Style guide enforcement for VHDL
GNU General Public License v3.0
189 stars 38 forks source link

Missing rules for `record` types #759

Closed harry-commin-enclustra closed 2 years ago

harry-commin-enclustra commented 2 years ago

Rules for end record

Currently, the type rules don't cover end record.

For example, with this input:

type x is record
        end      record    ;

the default fix is:

  type x is record
        end      record;

I guess the space before the semicolon is being handled by whitespace_003.

I'm not certain, but I guess 2 new rules matching these descriptions would make sense:

  1. "This rule checks the indent of the end keyword".
  2. "This rule checks for a single space between the end and record keywords".

Rules for end record optional name

There is an optional end record [ record_type_simple_name ] defined in the VHDL LRM. There doesn't appear to be a type rule covering this.

I guess this could be handled the same as other optional names. For example, the corresponding entity rules are:

  1. entity_019
  2. entity_012

I'm not sure if an additional rule is needed to check for a single space between end record and the optional type name, or if this is already handled by another rule.

Rules for line breaks

I am aware of 2 common styles. The record keyword belongs either on the same line as type, or on its own line:

  -- record keyword on same line
  type x is record
      a : bit;
  end record;

  -- record keyword on its own line
  type x is
  record
      a : bit;
  end record;

I'm not sure if these options could be handled similarly to simple cases like assert_002 or assert_004 because I'm not sure how this interacts with the indentation configuration. My guess is that there is no problem because the record elements simply have +1 indentation.

I can see that a vsg/token/record_type_definition.py file exists, but I haven't wrapped my head around how indentation configuration works yet.

jeremiah-c-leary commented 2 years ago

Hey @hcommin,

There are quite a few rules we could add for record types:

I may have missed a couple.

There are existing rules which already perform those checks. At this point it takes longer to write the tests and the documentation than it takes to implement the actual rule.

I'm not sure if these options could be handled similarly to simple cases like assert_002 or assert_004 because I'm not sure how this interacts with the indentation configuration. My guess is that there is no problem because the record elements simply have +1 indentation.

The default indent for a record_type_definition is:

390         record_type_definition:
391             record_keyword:
392                 token : current
393                 after : '+1'
394             end_keyword:
395                 token : '-1'
396                 after : '-1'

and full_type_declaration is:

205         full_type_declaration:
206             type_keyword:
207                 token : current
208                 after : current

So in your example:

  -- record keyword on same line
  type x is record
      a : bit;
  end record;

...and assuming the beginning indent is 2...

when VSG hits type it finds the full_type_declaration key and then the type_keyword key. VSG sees that token is set to current so it assigns the indent of 2 to the type keyword token. It then sees the after key is current and keeps the indent at 2. When VSG hits the record keyword, it checks the record_type_definition[record_keyword] entry and assigns the record token a value of 2 because of the current key. VSG then sets the indent to 3 because of the +1 in the after key.

This visualization might help:

 -- Indent    2   2 2    2
             type x is record
 -- Indent     3 3  3 3
               a : bit;
 -- Indent    2    2   2
             end record;

So in the second case, with the record on the next line it would actually match the indent in your example:

  3 -- Indent    2   2 2
  4             type x is
  5 -- Indent    2
  6             record
  7 -- Indent     3 3  3 3
  8               a : bit;
  9 -- Indent    2    2   2
 10             end record;

because the record keyword was given an indent of 2 and the indent of the record keyword is only checked if record it at the beginning of a line.

Let me know what you think of my list of rules and if I forgot anything.

Regards,

--Jeremy

harry-commin-enclustra commented 2 years ago

@jeremiah-c-leary Thank you for the detailed example of indentation configuration. If I understood correctly, the indent is only checked for items that are at the beginning of a line (which seems obvious, now that I write it). So, if we want an item to be on its own line, then we use a rule... and the indentation configuration just sets the indentation correctly at the start of each line. For example, if we want to convert from this:

                    type x is record a : bit; end record;

to this:

  type x
  is record
    a : bit;
  end record;

then several "abc on its own line" and/or "no code after xyz" rules would need to be defined. Without these rules, the indentation configuration would only fix the indent:

  type x is record a : bit; end record;

Is that correct?

Your list of record rules makes me realize I only thought about 2 "valid" coding styles I am familiar with, and not the large family of violations we actually want to detect and correct.

Your list seems comprehensive to me. I inserted my minor comments in sub-bullets:

I can certainly attempt to help with writing documentation. My Python skills are a work in progress, so you may not want me going near your test code.

harry-commin-enclustra commented 2 years ago

@jeremiah-c-leary In my workplace, we require a different suffix for record type identifiers (_r, instead of _t for all other types).

I don't mind just setting suffixes : ['_t', '_r'] for all types. But if it's easy to define independent rules, then there could be some value in that.

We are also required to suffix enumeration literals with _s. So, would it be possible to add prefix/suffix rules for these identifiers too? Example:

-- Violation
type state_machine is (idle, write, read, done);

-- Fix
type state_machine is (idle_s, write_s, read_s, done_s);
jeremiah-c-leary commented 2 years ago

Hey @hcommin,

Is that correct?

That is correct.

Rules are grouped into phases that would match the steps I would take in reformatting a file. Each phase transforms the code for the next phase to operate on. This means every rule in a later phase is making an assumption about what was performed in a previous phase.

To take your example to an extreme:

type
x
is
record a : bit; end
record
;

The first phase handles all the structural changes, like same line, own line and optional rules and would transform to the code into:

type x is record
a : bit;
end record x;

The second phase handles whitespace, but in this example there are no whitespace issues because I put everything on it's own line. So the code stays as:

type x is record
a : bit;
end record x;

The third phase is vertical space, or blank lines. if we assumed we had a rule to add a blank line after the record keyword and before the end keyword, the code would be transformed into:

type x is record

a : bit;

end record x;

The fourth phase is indentation. So the code would be transformed into:

  type x is record

    a : bit;

  end record x;

The fifth phase is alignment, which is anything that should be column aligned that is not an indent. In this case no rules apply. so the code stays as:

  type x is record

    a : bit;

  end record x;

The sixth phase is capitalization. If we assume we configured a rule that the identifier must be uppercase, then the code would be transformed into:

  type X is record

    a : bit;

  end record X;

The seventh phase checks for naming conventions. So if we configured the rule that all types must have a prefix of t_, then VSG would log a violation since X does not start with t_.

Your list of record rules makes me realize I only thought about 2 "valid" coding styles I am familiar with, and not the large family of violations we actually want to detect and correct.

Yeah...when you really start to look at what it takes to enforce a certain coding style, then the number of rules increases dramatically.

Should this be: record on the same line as is? (See my example above with is record on its own line). I agree. How are rule conflicts handled? Will the user see an error if this rule is enabled at the same time as the one above? I may have a rule that can be configured for either, I would need to check. Otherwise, there is currently no method to handle conflicting rules. However, every attempt is made to ensure they are not conflicting.

Maybe I could implement a method to flag conflicting rules and notify the user.

Should this be keywords (plural)? I would only indent the first record keyword if it is on it's own line.

A rule already exists for indent of type. I don't know if rules make sense for indents of identifier or is. We could implement rules for indenting identifier and is, but I would hope nobody would want to have those on their own lines. Although I do not see the purpose of having record on it's own line either. One thing I have learned from this project is their are a plethora of VHDL coding styles.

I can start implementing the rules above.

--Jeremy

jeremiah-c-leary commented 2 years ago

I don't mind just setting suffixes : ['_t', '_r'] for all types. But if it's easy to define independent rules, then there could be some value in that.

I want to think that should not be too hard to implement. I would create an issue and we can explore it.

We are also required to suffix enumeration literals with _s. So, would it be possible to add prefix/suffix rules for these identifiers too?

To get an idea for how hard it would be to implement a rule, you can use the vsg_parser script to see how VSG interprets a file. I implemented your example:

  1
  2 architecture rtl of fifo is
  3
  4   type state_machine is (idle, write, read, done);
  5
  6 begin end architecture rtl;

...and issued the following command:

$vsg_parser -f example.vhd
--------------------------------------------------------------------------------
0 |
--------------------------------------------------------------------------------
1 |
<class 'vsg.parser.blank_line'>
--------------------------------------------------------------------------------
2 | architecture rtl of fifo is
<class 'vsg.token.architecture_body.architecture_keyword'>
<class 'vsg.token.architecture_body.identifier'>
<class 'vsg.token.architecture_body.of_keyword'>
<class 'vsg.token.architecture_body.entity_name'>
<class 'vsg.token.architecture_body.is_keyword'>
--------------------------------------------------------------------------------
3 |
<class 'vsg.parser.blank_line'>
--------------------------------------------------------------------------------
4 |   type state_machine is (idle, write, read, done);
<class 'vsg.token.full_type_declaration.type_keyword'>
<class 'vsg.token.full_type_declaration.identifier'>
<class 'vsg.token.full_type_declaration.is_keyword'>
<class 'vsg.token.enumeration_type_definition.open_parenthesis'>
<class 'vsg.token.enumeration_type_definition.enumeration_literal'>
<class 'vsg.token.enumeration_type_definition.comma'>
<class 'vsg.token.enumeration_type_definition.enumeration_literal'>
<class 'vsg.token.enumeration_type_definition.comma'>
<class 'vsg.token.enumeration_type_definition.enumeration_literal'>
<class 'vsg.token.enumeration_type_definition.comma'>
<class 'vsg.token.enumeration_type_definition.enumeration_literal'>
<class 'vsg.token.enumeration_type_definition.close_parenthesis'>
<class 'vsg.token.full_type_declaration.semicolon'>
--------------------------------------------------------------------------------
5 |
<class 'vsg.parser.blank_line'>
--------------------------------------------------------------------------------
6 | begin end architecture rtl;
<class 'vsg.token.architecture_body.begin_keyword'>
<class 'vsg.token.architecture_body.end_keyword'>
<class 'vsg.token.architecture_body.end_architecture_keyword'>
<class 'vsg.token.architecture_body.architecture_simple_name'>
<class 'vsg.token.architecture_body.semicolon'>

So this shows you how VSG is classifying the files and assigning tokens from vsg.token.

For this example, idle, write, read and done are classified as vsg.token.enumeration_type_definition.enumeration_literal. So a rule can be written to apply prefixes/suffixes as current rule definitions, such as signal_600 work on a single token:

  1
  2 from vsg.rules import token_suffix
  3
  4 from vsg import token
  5
  6 lTokens = []
  7 lTokens.append(token.signal_declaration.identifier)
  8
  9
 10 class rule_600(token_suffix):
 11     '''
 12     This rule checks for valid suffixes on signal identifiers.
 13     Default signal suffix is *\_s*.
 14
 15     |configuring_prefix_and_suffix_rules_link|
 16
 17     **Violation**
 18
 19     .. code-block:: vhdl
 20
 21        signal wr_en : std_logic;
 22        signal rd_en : std_logic;
 23
 24     **Fix**
 25
 26     .. code-block:: vhdl
 27
 28        signal wr_en_s : std_logic;
 29        signal rd_en_s : std_logic;
 30     '''
 31
 32     def __init__(self):
 33         token_suffix.__init__(self, 'signal', '600', lTokens)
 34         self.suffixes = ['_s']
 35         self.solution = 'Signal identifiers'

So it is just a matter of copying that rule and changing the token type:

  1
  2 from vsg.rules import token_suffix
  3
  4 from vsg import token
  5
  6 lTokens = []
  7 lTokens.append(token.enumeration_type_definition.enumeration_literal)
  8
  9
 10 class rule_600(token_suffix):
 11     '''
 12     This rule checks for valid suffixes on signal identifiers.
 13     Default signal suffix is *\_s*.
 14
 15     |configuring_prefix_and_suffix_rules_link|
 16
 17     **Violation**
 18
 19     .. code-block:: vhdl
 20
 21        signal wr_en : std_logic;
 22        signal rd_en : std_logic;
 23
 24     **Fix**
 25
 26     .. code-block:: vhdl
 27
 28        signal wr_en_s : std_logic;
 29        signal rd_en_s : std_logic;
 30     '''
 31
 32     def __init__(self):
 33         token_suffix.__init__(self, 'signal', '600', lTokens)
 34         self.suffixes = ['_s']
 35         self.solution = 'Enumerated type'

The docstring and lines 33 and 35 would have to updated, but that kind of gives you an idea for what is involved.

So as long as you want every enumerated type to have a prefix/suffix, then we can create a rule for it. If there are special cases, for example different prefixes/suffixes if it is a record type versus not a record type, then it will get more difficult because VSG will have to know context.

If you would like prefix/suffix rules for enumerated types, go ahead and create another issue.

Regards,

--Jeremy

jeremiah-c-leary commented 2 years ago

Hey @hcommin,

I have pushed an update for this to the issue-759 branch. When you get a chance could you check it out on your end.

I will plan to merge this to trunk on April 6th if I do not hear back from you by then.

Thanks,

--Jeremy

harry-commin-enclustra commented 2 years ago

@jeremiah-c-leary

The behavior looks correct in my test cases. I need to spend a bit more time looking into the things you wrote above, as the coding guidelines in my workplace allow some differences for record types (which don't fit really naturally with the LRM). But, on the technical side, I think this issue is ready to close.

I just spotted a couple of very minor documentation issues:

  1. Copy-paste error for record_type_definition_004 in docs/record_type_definition_rules.rst (**block** should be **record**):
record_type_definition_004
##########################

|phase_1| |error| |structure|

This rule checks the **is** keyword is on the same line as the **block** keyword.
  1. Every instance of on it's own line should be replaced with on its own line. (This probably applies to several documents).
jeremiah-c-leary commented 2 years ago

Hey @hcommin,

Thanks for catching the documentation issues. I have pushed an update.

--Jeremy