Open mosherubin opened 2 years ago
🤔 This would make some parses a bit more complicated. Is there any situation where such a thing would help?
Current:
New:
IMO, the current version supports a simpler user implementation. Some example usage supporting your preference would be really nice :)
Funny you should ask ;-) The reason for this change, made in my private cloned repo, was because of a proprietary rule I wrote for my company.
My company has a coding standard where every function begins and ends with a call to the internal logging system, tracing the entry and exit in the function. The string passed to the logging system (a) Has a '+' or '-' to denote entry/exit respectively, and (b) following the +/- character must be the full and precise function name. Since all of our functions have a single point of exit, both logging statements are always logged. Here is what a function might look like:
uint32_t Client::GetGrpcRetryInterval(void)
{
LOG_TRACE("+ Client::GetGrpcRetryInterval");
Configuration config = this->_configFile->GetConfig();
LOG_TRACE("- Client::GetGrpcRetryInterval return %d", config.grpc_retry_interval());
return config.grpc_retry_interval();
}
The purpose of the rule was to enforce the coding standard, checking that each function conforms to the standard. The rule flags different coding errors, among them:
The rule's logic is simple:
token.fullName
.When I ran the rule on our entire code base, it correctly flagged hundreds of coding errors. However, it incorrectly flagged overloaded operator functions as the LOG_TRACE strings not matching the function definition's full name. Here's an example to illustrate the problem:
Thread& Thread::operator=(Thread &other)
{
LOG_TRACE("+ Thread::operator=");
...
LOG_TRACE("- Thread::operator=");
return *this;
}
This produces the following lexer.tokenlist
:
tokenlist = {list: 42}
00 = {LexToken} LexToken(ID,'Thread',1,1,0, False, None)
01 = {LexToken} LexToken(AND,'&',1,7,6, False, None)
02 = {LexToken} LexToken(SPACE,' ',1,8,7, False, None)
03 = {LexToken} LexToken(ID,'Thread',1,9,8, False, None)
04 = {LexToken} LexToken(DOUBLECOLON,'::',1,15,14, False, None)
05 = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
06 = {LexToken} LexToken(EQUALS,'=',1,25,24, False, None)
07 = {LexToken} LexToken(LPAREN,'(',1,26,25, False, None)
08 = {LexToken} LexToken(ID,'Thread',1,27,26, False, None)
09 = {LexToken} LexToken(SPACE,' ',1,33,32, False, None)
10 = {LexToken} LexToken(AND,'&',1,34,33, False, None)
11 = {LexToken} LexToken(ID,'other',1,35,34, False, None)
12 = {LexToken} LexToken(RPAREN,')',1,40,39, False, None)
13 = {LexToken} LexToken(LINEFEED,'\n',1,41,40, False, None)
14 = {LexToken} LexToken(LBRACE,'{',2,1,41, False, None)
15 = {LexToken} LexToken(LINEFEED,'\n',2,2,42, False, None)
16 = {LexToken} LexToken(SPACE,' ',3,1,43, False, None)
17 = {LexToken} LexToken(ID,'LOG_TRACE',3,5,47, False, None)
18 = {LexToken} LexToken(LPAREN,'(',3,14,56, False, None)
19 = {LexToken} LexToken(STRING,'"+ Thread::operator="',3,15,57, False, None)
20 = {LexToken} LexToken(RPAREN,')',3,36,78, False, None)
21 = {LexToken} LexToken(SEMI,';',3,37,79, False, None)
22 = {LexToken} LexToken(LINEFEED,'\n',3,38,80, False, None)
23 = {LexToken} LexToken(SPACE,' ',4,1,81, False, None)
24 = {LexToken} LexToken(ELLIPSIS,'...',4,5,85, False, None)
25 = {LexToken} LexToken(LINEFEED,'\n',4,8,88, False, None)
26 = {LexToken} LexToken(SPACE,' ',5,1,89, False, None)
27 = {LexToken} LexToken(ID,'LOG_TRACE',5,5,93, False, None)
28 = {LexToken} LexToken(LPAREN,'(',5,14,102, False, None)
29 = {LexToken} LexToken(STRING,'"- Thread::operator="',5,15,103, False, None)
30 = {LexToken} LexToken(RPAREN,')',5,36,124, False, None)
31 = {LexToken} LexToken(SEMI,';',5,37,125, False, None)
32 = {LexToken} LexToken(LINEFEED,'\n',5,38,126, False, None)
33 = {LexToken} LexToken(SPACE,' ',6,1,127, False, None)
34 = {LexToken} LexToken(RETURN,'return',6,5,131, False, None)
35 = {LexToken} LexToken(SPACE,' ',6,11,137, False, None)
36 = {LexToken} LexToken(TIMES,'*',6,12,138, False, None)
37 = {LexToken} LexToken(THIS,'this',6,13,139, False, None)
38 = {LexToken} LexToken(SEMI,';',6,17,143, False, None)
39 = {LexToken} LexToken(LINEFEED,'\n',6,18,144, False, None)
40 = {LexToken} LexToken(RBRACE,'}',7,1,145, False, None)
41 = {LexToken} LexToken(LINEFEED,'\n',7,2,146, False, None)
The problem lies within token[5] above:
lexer.tokenlist[5] = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
additional = {str} ''
column = {int} 17
context = {Context} <nsiqcppstyle_checker.Context object at 0x05046AB0>
contextStack = {ContextStack}
decl = {bool} False
filename = {str} 'D:\\Junk\\NsiqCppStyle\\operator-2.cpp'
fullName = {str} 'operator::'
inactive = {bool} False
index = {int} 5
lexer = {Lexer} <nsiqcppstyle_lexer.Lexer object at 0x05044330>
lexpos = {int} 16
line = {str} 'Thread& Thread::operator=(Thread &other)'
lineno = {int} 1
pp = {NoneType} None
type = {str} 'FUNCTION'
value = {str} 'operator'
Although the value ("operator") is correct, the token.fullName
value found is "operator::", which is wrong on several counts:
token.fullName
to be "Thread::operator=". If it were correctly "Thread::operator=", the STRING checks are straightforward and consistent for overloaded- and non-overloaded functions alike.IMHO, an overloaded operator function is just like any other function name. We should not consider the name to be the string "operator" followed by an equal ("=") character, but rather a function whose name just happens to be "operator=".
I think the opposing question should be posed: in what case would a rule writer want the FUNCTION token.fullName
to only correspond to "operator", leaving the work of concatenating the operator etc. to the rule writer. I had no need to split the overloaded operator function name - I wanted to know the full name. Should the rule writer want to know the precise operator, he can easily take a slice of the function name, or look at the following token type.
The Problem
When NsiqCppStyle parses the following code snippet:
The internal token list (accessed via the
lexer.tokenlist
member variable) looks like this:The problem is that NsiqCppStyle returns the overloaded operator function name ("operator=") as two tokens, i.e., a FUNCTION ("operator" at offset 5) and an EQUALS ("=" at offset 6). This is incorrect - the overloaded operator function name should be returned as one FUNCTION token with a value of "operator=":
This is the correct way to parse the overloaded operator function name - tokenlist[5] shows the FUNCTION token with a value of "operator=".