Open patrick-studer opened 4 months ago
Evening @patrick-studer ,
I was writing the test for the regex and just wanted to ensure that you would expect FIFO
and MY_FIFO
to fail. The regex expects at least one underscore to pass.
--Jeremy
Evening @patrick-studer ,
Disregard my previous comment. I just pushed an update to the issue-1202
branch. When you get a chance could you check it out on your end and let me know if it working for you?
Thanks,
--Jeremy
Hi @jeremiah-c-leary
Disregard my previous comment. I just pushed an update to the issue-1202 branch. When you get a chance could you check it out on your end and let me know if it working for you?
I could test your implementation and it does exactly what I expected. But your previous comment made me think a bit and I came to the conclusion, that my pattern should not be called Pascal_Snake_Case
(because it is not what officially is interpreted under this term)!
I was writing the test for the regex and just wanted to ensure that you would expect FIFO and MY_FIFO to fail. The regex expects at least one underscore to pass.
I see your point!
We have 2 problems - one with my Pascal_Snake_Case
implementation, and the other one in the existing PascalCase
/camelCase
regex.
Let's have a detailed look into it.
The pattern I requested was something like this "WhatA_WonderfulDay". This is PascalCase with the allowance of underscores (followed by a capital letter or a number). Without further thinking, I named this pattern Pascal_Snake_Case
...
According to definitions found in the web, Pascal_Snake_Case
should separate every single word with an underscore. Therefore, every capital letter must have a preceding underscore:
My given regex is not matching that pattern since my definition should allow multiple capital letters in one underscore-separated part. Therefore, I propose to additionally introduce the real Pascal_Snake_Case
pattern, as well as the Relaxed_PascalCase
:
Pascal_Snake_Case
(the real one)Relaxed_PascalCase
(PascalCase with underscores)If you have a better name for it, please feel free to change it!
Before I come to my proposal of a implementation, let me tackle the second problem I see.
This is now very complicated to explain. I hope you may follow my examples...
Your regex allows any number of capital letters, as long as they are followed by at least one lowercase letter (i.e. (?:[A-Z])+(?:[a-z0-9])+
)
This matches also for names, which do not follow the standard and should throw an error/waring (e.g. MyAXIBusArbiter/myAXIBusArbiter).
PascalCase/camelCase defines, that every word must start with a capital letter and the rest is lowercase.
For abbreviation, that means the correct way of writing would be MyAxiBusArbiter/myAxiBusArbiter.
If you add a number at the end of a name, its even matching everything (so not really helpful for linting).
First I thought, we should only allow 1 capital letter in a series, but this is also not good since there are "single-character-words" which lead to 2 capital letters in a row (e.g. WhatAWonderfulDay).
So, I tried to find a regex, which could deal with that but also does find the previously ignored errors...
I come up with the following proposal.
Pascal_Snake_Case
(the real one)Relaxed_PascalCase
(PascalCase with underscores)Here my updated code for the case-regex:
camelCase = re.compile("(?!.*[A-Z]{3})[a-z][a-zA-Z0-9]*")
PascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*")
PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-z0-9]*(?:_[A-Z0-9][a-z0-9]*)*")
RelaxedPascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*(?:_[A-Z0-9][a-zA-Z0-9]*)*")
The new thing I propose is a "negative lookahead" to mismatch if three capital letters in a row are found (i.e. (?!.*[A-Z]{3})
). The rest is basically the same, just without the non-capturing grouping.
I did a quick test with a set of port-names, checked with port_010 rule.
"ok" means, this is matching the regex and does not produce error/warning. "x" means does not match and will report error/warning.
You may see that I also added numbers to the example. This is to demonstrate the improvement given with my new regex patterns. (I hope nobody will call a signal/port like that - but for demonstration purpose it was good!)
port_010:case | lower | upper | upper_or_lower | camelCase | PascalCase | Pascal_Snake_Case | Relaxed_PascalCase |
---|---|---|---|---|---|---|---|
whatawonderful19day42 | ok | x | ok | ok | x | x | x |
whatAwonderful19day42 | x | x | x | ok | x | x | x |
WHATAWONDERFUL19DAY42 | x | ok | ok | x | x | x | x |
WHATaWONDERFUL19DAY42 | x | x | x | x | x | x | x |
what_a_wonderful_19_day_42 | ok | x | ok | x | x | x | x |
what_a_wonderful19_day42 | ok | x | ok | x | s | x | x |
WHAT_A_WONDERFUL_19_DAY_42 | x | ok | ok | x | x | x | x |
WHAT_A_WONDERFUL19_DAY42 | x | ok | ok | x | x | x | x |
whatAWonderful19Day42 | x | x | x | ok | x | x | x |
whatAWOnderful19Day42 | x | x | x | x | x | x | x |
WhatAWonderful19Day42 | x | x | x | x | ok | x | ok |
WhatAWOnderful19Day42 | x | x | x | x | x | x | x |
What_A_Wonderful_19_Day_42 | x | x | x | x | x | ok | ok |
What_A_WOnderful_19_Day_42 | x | x | x | x | x | x | ok |
WhatA_Wonderful19Day_42 | x | x | x | x | x | x | ok |
WhatA_WOnderful19Day_42 | x | x | x | x | x | x | ok |
WhatA_WONderful19Day_42 | x | x | x | x | x | x | x |
Afternoon @patrick-studer ,
I apologize for the late response.
Regarding the proposed updates:
camelCase = re.compile("(?!.*[A-Z]{3})[a-z][a-zA-Z0-9]*")
PascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*")
PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-z0-9]*(?:_[A-Z0-9][a-z0-9]*)*")
RelaxedPascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*(?:_[A-Z0-9][a-zA-Z0-9]*)*")
Since camelCase
and PascalCase
are already relaxed with respect to consecutive capital letters, I would propose the following:
camelCase
and PascalCase
the same definitionstrict_camelCase
and strict_PascalCase
which matches your redefined versionsPascalSnakeCase
PascalSnakeCase
into strict_PascalSnakeCase
So something like this:
camelCase = re.compile("(?:[a-z])+(?:[a-z0-9])*((?:[A-Z])+(?:[a-z0-9])+)*")
strict_cameCase = re.compile("(?!.*[A-Z]{3})[a-z][a-zA-Z0-9]*")
PascalCase = re.compile("((?:[A-Z])+(?:[a-z0-9])+)+")
strict_PascalCase = PascalCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*")
PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-zA-Z0-9]*(?:_[A-Z0-9][a-zA-Z0-9]*)*")
strict_PascalSnakeCase = re.compile("(?!.*[A-Z]{3})[A-Z][a-z0-9]*(?:_[A-Z0-9][a-z0-9]*)*")
Granted it seems backwards, with using strict
versus relaxed
, but it would have the least affect on current users.
What do you think?
Regards,
--Jeremy
Morning @patrick-studer ,
Just wanted to ping you on this issue to see if we could move it forward.
Thanks,
--Jeremy
Morning @patrick-studer ,
Just wanted to ping you on this issue to see if we could move it forward.
Thanks,
--Jeremy
Hi Jeremy
Sorry for the late response. There was little time for open-source projects in the past months. I will catch up my work here and let you know my feedback.
Currently, I'm not sure if keeping the existing PascalCase and camelCase (and introducing a more "strict" version) is the right thing only for keeping backwards compatibility. Since the current implementation is not throwing errors at wrong capitalization where it should (because multiple capital letters or with the problem of numbers at the end). Updating the current PascalCase and camelCase to the more strict (right) version would show existing users more problems. If they then decide, that they do not agree with the real definition, they may "downgrade" to the "relaxed" definition (the old one).
But let me evaluate your proposal and come back to this soon.
Is your feature request related to a problem? Please describe. The provided case-checking rules are currently (before v3.25) limited to:
lower
(which allows underscores too => _snakecase),upper
(which allows underscores too => _SCREAMING_SNAKECASE),upper_or_lower
(which allows one or the other, no mixture)PascalCase
(which does not allow underscores...),camelCase
(which does not allow underscores...).What I'm missing (or was missing before v3.25) is the possibility to define snaked-PascalCase. This is basically the same as PascalCase but allows optional separation/grouping of word groups with an underscore. The first letter of the word, and every letter after an underscore must be uppercase and the rest must be lowercase.
Example: _MyCore_DataChannelClockEnable
Describe the solution you'd like With the new release v3.25 you allow to introduce regex checks for the case-rules which is very handy if you have one rule which should differ from the "defaults". But since _Pascal_SnakeCase is a commonly uses formatting (and it is our default style for almost everything (signals, constants, variables, procedures, functions, ...)), it would be very nice to have it supported by VSG.
Would you mind adding the new case-keyword (like
Pascal_Snake_Case
) together with the matching regular expression string?_vsg/rules/caseutils.py:
Pascal_Snake_Case = re.compile("(((?:[A-Z])+(?:[a-z0-9])*)+_?)+")