jeremiah-c-leary / vhdl-style-guide

Style guide enforcement for VHDL
GNU General Public License v3.0
192 stars 40 forks source link

Add Pascal_Snake_Case to case checking rules #1202

Open patrick-studer opened 4 months ago

patrick-studer commented 4 months ago

Is your feature request related to a problem? Please describe. The provided case-checking rules are currently (before v3.25) limited to:

What I'm missing (or was missing before v3.25) is the possibility to define snaked-PascalCase. This is basically the same as PascalCase but allows optional separation/grouping of word groups with an underscore. The first letter of the word, and every letter after an underscore must be uppercase and the rest must be lowercase.

Example: _MyCore_DataChannelClockEnable

Describe the solution you'd like With the new release v3.25 you allow to introduce regex checks for the case-rules which is very handy if you have one rule which should differ from the "defaults". But since _Pascal_SnakeCase is a commonly uses formatting (and it is our default style for almost everything (signals, constants, variables, procedures, functions, ...)), it would be very nice to have it supported by VSG.

Would you mind adding the new case-keyword (like Pascal_Snake_Case) together with the matching regular expression string?

_vsg/rules/caseutils.py: Pascal_Snake_Case = re.compile("(((?:[A-Z])+(?:[a-z0-9])*)+_?)+")

jeremiah-c-leary commented 4 months ago

Evening @patrick-studer ,

I was writing the test for the regex and just wanted to ensure that you would expect FIFO and MY_FIFO to fail. The regex expects at least one underscore to pass.

--Jeremy

jeremiah-c-leary commented 4 months ago

Evening @patrick-studer ,

Disregard my previous comment. I just pushed an update to the issue-1202 branch. When you get a chance could you check it out on your end and let me know if it working for you?

Thanks,

--Jeremy

patrick-studer commented 4 months ago

Hi @jeremiah-c-leary

Disregard my previous comment. I just pushed an update to the issue-1202 branch. When you get a chance could you check it out on your end and let me know if it working for you?

I could test your implementation and it does exactly what I expected. But your previous comment made me think a bit and I came to the conclusion, that my pattern should not be called Pascal_Snake_Case (because it is not what officially is interpreted under this term)!

I was writing the test for the regex and just wanted to ensure that you would expect FIFO and MY_FIFO to fail. The regex expects at least one underscore to pass.

I see your point! We have 2 problems - one with my Pascal_Snake_Case implementation, and the other one in the existing PascalCase/camelCase regex.

Let's have a detailed look into it.

Problem in my Pascal_Snake_Case implementation

The pattern I requested was something like this "WhatA_WonderfulDay". This is PascalCase with the allowance of underscores (followed by a capital letter or a number). Without further thinking, I named this pattern Pascal_Snake_Case...

According to definitions found in the web, Pascal_Snake_Case should separate every single word with an underscore. Therefore, every capital letter must have a preceding underscore:

My given regex is not matching that pattern since my definition should allow multiple capital letters in one underscore-separated part. Therefore, I propose to additionally introduce the real Pascal_Snake_Case pattern, as well as the Relaxed_PascalCase:

If you have a better name for it, please feel free to change it!

Before I come to my proposal of a implementation, let me tackle the second problem I see.

Problem with existing PascalCase/camelCase

This is now very complicated to explain. I hope you may follow my examples...

Your regex allows any number of capital letters, as long as they are followed by at least one lowercase letter (i.e. (?:[A-Z])+(?:[a-z0-9])+) This matches also for names, which do not follow the standard and should throw an error/waring (e.g. MyAXIBusArbiter/myAXIBusArbiter). PascalCase/camelCase defines, that every word must start with a capital letter and the rest is lowercase. For abbreviation, that means the correct way of writing would be MyAxiBusArbiter/myAxiBusArbiter.

If you add a number at the end of a name, its even matching everything (so not really helpful for linting).

First I thought, we should only allow 1 capital letter in a series, but this is also not good since there are "single-character-words" which lead to 2 capital letters in a row (e.g. WhatAWonderfulDay).

So, I tried to find a regex, which could deal with that but also does find the previously ignored errors...

I come up with the following proposal.

Proposal on improving case-check

  1. Introduce two new patterns (instead of 1):
    • Pascal_Snake_Case (the real one)
    • Relaxed_PascalCase (PascalCase with underscores)
  2. Update existing PascalCase and camelCase expressions to be more robust

Here my updated code for the case-regex:

camelCase = re.compile("(?!.*[A-Z]{3})[a-z][a-zA-Z0-9]*")
PascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*")
PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-z0-9]*(?:_[A-Z0-9][a-z0-9]*)*")
RelaxedPascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*(?:_[A-Z0-9][a-zA-Z0-9]*)*")

The new thing I propose is a "negative lookahead" to mismatch if three capital letters in a row are found (i.e. (?!.*[A-Z]{3})). The rest is basically the same, just without the non-capturing grouping.

Testing

I did a quick test with a set of port-names, checked with port_010 rule.

"ok" means, this is matching the regex and does not produce error/warning. "x" means does not match and will report error/warning.

You may see that I also added numbers to the example. This is to demonstrate the improvement given with my new regex patterns. (I hope nobody will call a signal/port like that - but for demonstration purpose it was good!)

port_010:case lower upper upper_or_lower camelCase PascalCase Pascal_Snake_Case Relaxed_PascalCase
whatawonderful19day42 ok x ok ok x x x
whatAwonderful19day42 x x x ok x x x
WHATAWONDERFUL19DAY42 x ok ok x x x x
WHATaWONDERFUL19DAY42 x x x x x x x
what_a_wonderful_19_day_42 ok x ok x x x x
what_a_wonderful19_day42 ok x ok x s x x
WHAT_A_WONDERFUL_19_DAY_42 x ok ok x x x x
WHAT_A_WONDERFUL19_DAY42 x ok ok x x x x
whatAWonderful19Day42 x x x ok x x x
whatAWOnderful19Day42 x x x x x x x
WhatAWonderful19Day42 x x x x ok x ok
WhatAWOnderful19Day42 x x x x x x x
What_A_Wonderful_19_Day_42 x x x x x ok ok
What_A_WOnderful_19_Day_42 x x x x x x ok
WhatA_Wonderful19Day_42 x x x x x x ok
WhatA_WOnderful19Day_42 x x x x x x ok
WhatA_WONderful19Day_42 x x x x x x x
jeremiah-c-leary commented 1 month ago

Afternoon @patrick-studer ,

I apologize for the late response.

Regarding the proposed updates:

camelCase = re.compile("(?!.*[A-Z]{3})[a-z][a-zA-Z0-9]*")
PascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*")
PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-z0-9]*(?:_[A-Z0-9][a-z0-9]*)*")
RelaxedPascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*(?:_[A-Z0-9][a-zA-Z0-9]*)*")

Since camelCase and PascalCase are already relaxed with respect to consecutive capital letters, I would propose the following:

  1. Keep camelCase and PascalCase the same definition
  2. optionally add a strict_camelCase and strict_PascalCase which matches your redefined versions
  3. Make your RelaxedPascalCase into PascalSnakeCase
  4. Make your PascalSnakeCase into strict_PascalSnakeCase

So something like this:

camelCase = re.compile("(?:[a-z])+(?:[a-z0-9])*((?:[A-Z])+(?:[a-z0-9])+)*")
strict_cameCase = re.compile("(?!.*[A-Z]{3})[a-z][a-zA-Z0-9]*")
PascalCase = re.compile("((?:[A-Z])+(?:[a-z0-9])+)+")
strict_PascalCase = PascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*")
PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*(?:_[A-Z0-9][a-zA-Z0-9]*)*")
strict_PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-z0-9]*(?:_[A-Z0-9][a-z0-9]*)*")

Granted it seems backwards, with using strict versus relaxed, but it would have the least affect on current users.

What do you think?

Regards,

--Jeremy

jeremiah-c-leary commented 2 days ago

Morning @patrick-studer ,

Just wanted to ping you on this issue to see if we could move it forward.

Thanks,

--Jeremy

patrick-studer commented 8 hours ago

Morning @patrick-studer ,

Just wanted to ping you on this issue to see if we could move it forward.

Thanks,

--Jeremy

Hi Jeremy

Sorry for the late response. There was little time for open-source projects in the past months. I will catch up my work here and let you know my feedback.

Currently, I'm not sure if keeping the existing PascalCase and camelCase (and introducing a more "strict" version) is the right thing only for keeping backwards compatibility. Since the current implementation is not throwing errors at wrong capitalization where it should (because multiple capital letters or with the problem of numbers at the end). Updating the current PascalCase and camelCase to the more strict (right) version would show existing users more problems. If they then decide, that they do not agree with the real definition, they may "downgrade" to the "relaxed" definition (the old one).

But let me evaluate your proposal and come back to this soon.