Add Pascal_Snake_Case to case checking rules

patrick-studer commented 1 month ago

Is your feature request related to a problem? Please describe. The provided case-checking rules are currently (before v3.25) limited to:

lower (which allows underscores too => _snakecase),
upper (which allows underscores too => _SCREAMING_SNAKECASE),
upper_or_lower (which allows one or the other, no mixture)
PascalCase (which does not allow underscores...),
camelCase (which does not allow underscores...).

What I'm missing (or was missing before v3.25) is the possibility to define snaked-PascalCase. This is basically the same as PascalCase but allows optional separation/grouping of word groups with an underscore. The first letter of the word, and every letter after an underscore must be uppercase and the rest must be lowercase.

Example: _MyCore_DataChannelClockEnable

Describe the solution you'd like With the new release v3.25 you allow to introduce regex checks for the case-rules which is very handy if you have one rule which should differ from the "defaults". But since _Pascal_SnakeCase is a commonly uses formatting (and it is our default style for almost everything (signals, constants, variables, procedures, functions, ...)), it would be very nice to have it supported by VSG.

Would you mind adding the new case-keyword (like Pascal_Snake_Case) together with the matching regular expression string?

_vsg/rules/caseutils.py: Pascal_Snake_Case = re.compile("(((?:[A-Z])+(?:[a-z0-9])*)+_?)+")

jeremiah-c-leary commented 1 month ago

Evening @patrick-studer ,

I was writing the test for the regex and just wanted to ensure that you would expect FIFO and MY_FIFO to fail. The regex expects at least one underscore to pass.

--Jeremy

jeremiah-c-leary commented 1 month ago

Evening @patrick-studer ,

Disregard my previous comment. I just pushed an update to the issue-1202 branch. When you get a chance could you check it out on your end and let me know if it working for you?

Thanks,

--Jeremy

patrick-studer commented 1 month ago

Hi @jeremiah-c-leary

Disregard my previous comment. I just pushed an update to the issue-1202 branch. When you get a chance could you check it out on your end and let me know if it working for you?

I could test your implementation and it does exactly what I expected. But your previous comment made me think a bit and I came to the conclusion, that my pattern should not be called Pascal_Snake_Case (because it is not what officially is interpreted under this term)!

I was writing the test for the regex and just wanted to ensure that you would expect FIFO and MY_FIFO to fail. The regex expects at least one underscore to pass.

I see your point! We have 2 problems - one with my Pascal_Snake_Case implementation, and the other one in the existing PascalCase/camelCase regex.

Let's have a detailed look into it.

Problem in my Pascal_Snake_Case implementation

The pattern I requested was something like this "WhatA_WonderfulDay". This is PascalCase with the allowance of underscores (followed by a capital letter or a number). Without further thinking, I named this pattern Pascal_Snake_Case...

According to definitions found in the web, Pascal_Snake_Case should separate every single word with an underscore. Therefore, every capital letter must have a preceding underscore:

_What_A_WonderfulDay (ok)
_WhatAWonderfulDay (not ok => but this is what I'm looking for)

My given regex is not matching that pattern since my definition should allow multiple capital letters in one underscore-separated part. Therefore, I propose to additionally introduce the real Pascal_Snake_Case pattern, as well as the Relaxed_PascalCase:

Pascal_Snake_Case (the real one)
Relaxed_PascalCase (PascalCase with underscores)

If you have a better name for it, please feel free to change it!

Before I come to my proposal of a implementation, let me tackle the second problem I see.

Problem with existing PascalCase/camelCase

This is now very complicated to explain. I hope you may follow my examples...

Your regex allows any number of capital letters, as long as they are followed by at least one lowercase letter (i.e. (?:[A-Z])+(?:[a-z0-9])+) This matches also for names, which do not follow the standard and should throw an error/waring (e.g. MyAXIBusArbiter/myAXIBusArbiter). PascalCase/camelCase defines, that every word must start with a capital letter and the rest is lowercase. For abbreviation, that means the correct way of writing would be MyAxiBusArbiter/myAxiBusArbiter.

If you add a number at the end of a name, its even matching everything (so not really helpful for linting).

First I thought, we should only allow 1 capital letter in a series, but this is also not good since there are "single-character-words" which lead to 2 capital letters in a row (e.g. WhatAWonderfulDay).

So, I tried to find a regex, which could deal with that but also does find the previously ignored errors...

I come up with the following proposal.

Proposal on improving case-check

Introduce two new patterns (instead of 1):
- Pascal_Snake_Case (the real one)
- Relaxed_PascalCase (PascalCase with underscores)
Update existing PascalCase and camelCase expressions to be more robust

Here my updated code for the case-regex:

camelCase = re.compile("(?!.*[A-Z]{3})[a-z][a-zA-Z0-9]*")
PascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*")
PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-z0-9]*(?:_[A-Z0-9][a-z0-9]*)*")
RelaxedPascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*(?:_[A-Z0-9][a-zA-Z0-9]*)*")

The new thing I propose is a "negative lookahead" to mismatch if three capital letters in a row are found (i.e. (?!.*[A-Z]{3})). The rest is basically the same, just without the non-capturing grouping.

Testing

I did a quick test with a set of port-names, checked with port_010 rule.

"ok" means, this is matching the regex and does not produce error/warning. "x" means does not match and will report error/warning.

You may see that I also added numbers to the example. This is to demonstrate the improvement given with my new regex patterns. (I hope nobody will call a signal/port like that - but for demonstration purpose it was good!)

port_010:case	lower	upper	upper_or_lower	camelCase	PascalCase	Pascal_Snake_Case	Relaxed_PascalCase
whatawonderful19day42	ok	x	ok	ok	x	x	x
whatAwonderful19day42	x	x	x	ok	x	x	x
WHATAWONDERFUL19DAY42	x	ok	ok	x	x	x	x
WHATaWONDERFUL19DAY42	x	x	x	x	x	x	x
what_a_wonderful_19_day_42	ok	x	ok	x	x	x	x
what_a_wonderful19_day42	ok	x	ok	x	s	x	x
WHAT_A_WONDERFUL_19_DAY_42	x	ok	ok	x	x	x	x
WHAT_A_WONDERFUL19_DAY42	x	ok	ok	x	x	x	x
whatAWonderful19Day42	x	x	x	ok	x	x	x
whatAWOnderful19Day42	x	x	x	x	x	x	x
WhatAWonderful19Day42	x	x	x	x	ok	x	ok
WhatAWOnderful19Day42	x	x	x	x	x	x	x
What_A_Wonderful_19_Day_42	x	x	x	x	x	ok	ok
What_A_WOnderful_19_Day_42	x	x	x	x	x	x	ok
WhatA_Wonderful19Day_42	x	x	x	x	x	x	ok
WhatA_WOnderful19Day_42	x	x	x	x	x	x	ok
WhatA_WONderful19Day_42	x	x	x	x	x	x	x

jeremiah-c-leary / vhdl-style-guide