kunaltyagi / nsiqcppstyle

Cpp style checker in python
GNU General Public License v2.0
26 stars 24 forks source link

Overloaded operator function name should be returned as one FUNCTION token, not two tokens FUNCTION + <operator> #36

Open mosherubin opened 2 years ago

mosherubin commented 2 years ago

The Problem

When NsiqCppStyle parses the following code snippet:

Event& Event::operator=(OUT Event& other)
{
}

The internal token list (accessed via the lexer.tokenlist member variable) looks like this:

tokenlist = {list: 19}
 00 = {LexToken} LexToken(ID,'Event',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,6,5, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,7,6, False, None)
 03 = {LexToken} LexToken(ID,'Event',1,8,7, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,13,12, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator',1,15,14, False, None)
 06 = {LexToken} LexToken(EQUALS,'=',1,23,22, False, None)
 07 = {LexToken} LexToken(LPAREN,'(',1,24,23, False, None)
 08 = {LexToken} LexToken(ID,'OUT',1,25,24, False, None)
 09 = {LexToken} LexToken(SPACE,' ',1,28,27, False, None)
 10 = {LexToken} LexToken(ID,'Event',1,29,28, False, None)
 11 = {LexToken} LexToken(AND,'&',1,34,33, False, None)
 12 = {LexToken} LexToken(SPACE,' ',1,35,34, False, None)
 13 = {LexToken} LexToken(ID,'other',1,36,35, False, None)
 14 = {LexToken} LexToken(RPAREN,')',1,41,40, False, None)
 15 = {LexToken} LexToken(LINEFEED,'\n',1,42,41, False, None)
 16 = {LexToken} LexToken(LBRACE,'{',2,1,42, False, None)
 17 = {LexToken} LexToken(LINEFEED,'\n',2,2,43, False, None)
 18 = {LexToken} LexToken(RBRACE,'}',3,1,44, False, None)

The problem is that NsiqCppStyle returns the overloaded operator function name ("operator=") as two tokens, i.e., a FUNCTION ("operator" at offset 5) and an EQUALS ("=" at offset 6). This is incorrect - the overloaded operator function name should be returned as one FUNCTION token with a value of "operator=":

tokenlist = {list: 18}
 00 = {LexToken} LexToken(ID,'Event',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,6,5, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,7,6, False, None)
 03 = {LexToken} LexToken(ID,'Event',1,8,7, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,13,12, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator=',1,15,14, False, None)
 06 = {LexToken} LexToken(LPAREN,'(',1,24,23, False, None)
 07 = {LexToken} LexToken(ID,'OUT',1,25,24, False, None)
 08 = {LexToken} LexToken(SPACE,' ',1,28,27, False, None)
 09 = {LexToken} LexToken(ID,'Event',1,29,28, False, None)
 10 = {LexToken} LexToken(SPACE,' ',1,34,33, False, None)
 11 = {LexToken} LexToken(AND,'&',1,35,34, False, None)
 12 = {LexToken} LexToken(ID,'other',1,36,35, False, None)
 13 = {LexToken} LexToken(RPAREN,')',1,41,40, False, None)
 14 = {LexToken} LexToken(LINEFEED,'\n',1,42,41, False, None)
 15 = {LexToken} LexToken(LBRACE,'{',2,1,42, False, None)
 16 = {LexToken} LexToken(LINEFEED,'\n',2,2,43, False, None)
 17 = {LexToken} LexToken(RBRACE,'}',3,1,44, False, None)

This is the correct way to parse the overloaded operator function name - tokenlist[5] shows the FUNCTION token with a value of "operator=".

kunaltyagi commented 2 years ago

🤔 This would make some parses a bit more complicated. Is there any situation where such a thing would help?

Current:

New:

IMO, the current version supports a simpler user implementation. Some example usage supporting your preference would be really nice :)

mosherubin commented 2 years ago

Funny you should ask ;-) The reason for this change, made in my private cloned repo, was because of a proprietary rule I wrote for my company.

My company has a coding standard where every function begins and ends with a call to the internal logging system, tracing the entry and exit in the function. The string passed to the logging system (a) Has a '+' or '-' to denote entry/exit respectively, and (b) following the +/- character must be the full and precise function name. Since all of our functions have a single point of exit, both logging statements are always logged. Here is what a function might look like:

uint32_t Client::GetGrpcRetryInterval(void)
{
    LOG_TRACE("+ Client::GetGrpcRetryInterval");

    Configuration config = this->_configFile->GetConfig();

    LOG_TRACE("- Client::GetGrpcRetryInterval return %d", config.grpc_retry_interval());
    return config.grpc_retry_interval();
}

The purpose of the rule was to enforce the coding standard, checking that each function conforms to the standard. The rule flags different coding errors, among them:

The rule's logic is simple:

When I ran the rule on our entire code base, it correctly flagged hundreds of coding errors. However, it incorrectly flagged overloaded operator functions as the LOG_TRACE strings not matching the function definition's full name. Here's an example to illustrate the problem:

Thread& Thread::operator=(Thread &other)
{
    LOG_TRACE("+ Thread::operator=");
    ...
    LOG_TRACE("- Thread::operator=");
    return *this;
}

This produces the following lexer.tokenlist:

tokenlist = {list: 42}
 00 = {LexToken} LexToken(ID,'Thread',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,7,6, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,8,7, False, None)
 03 = {LexToken} LexToken(ID,'Thread',1,9,8, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,15,14, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
 06 = {LexToken} LexToken(EQUALS,'=',1,25,24, False, None)
 07 = {LexToken} LexToken(LPAREN,'(',1,26,25, False, None)
 08 = {LexToken} LexToken(ID,'Thread',1,27,26, False, None)
 09 = {LexToken} LexToken(SPACE,' ',1,33,32, False, None)
 10 = {LexToken} LexToken(AND,'&',1,34,33, False, None)
 11 = {LexToken} LexToken(ID,'other',1,35,34, False, None)
 12 = {LexToken} LexToken(RPAREN,')',1,40,39, False, None)
 13 = {LexToken} LexToken(LINEFEED,'\n',1,41,40, False, None)
 14 = {LexToken} LexToken(LBRACE,'{',2,1,41, False, None)
 15 = {LexToken} LexToken(LINEFEED,'\n',2,2,42, False, None)
 16 = {LexToken} LexToken(SPACE,'    ',3,1,43, False, None)
 17 = {LexToken} LexToken(ID,'LOG_TRACE',3,5,47, False, None)
 18 = {LexToken} LexToken(LPAREN,'(',3,14,56, False, None)
 19 = {LexToken} LexToken(STRING,'"+ Thread::operator="',3,15,57, False, None)
 20 = {LexToken} LexToken(RPAREN,')',3,36,78, False, None)
 21 = {LexToken} LexToken(SEMI,';',3,37,79, False, None)
 22 = {LexToken} LexToken(LINEFEED,'\n',3,38,80, False, None)
 23 = {LexToken} LexToken(SPACE,'    ',4,1,81, False, None)
 24 = {LexToken} LexToken(ELLIPSIS,'...',4,5,85, False, None)
 25 = {LexToken} LexToken(LINEFEED,'\n',4,8,88, False, None)
 26 = {LexToken} LexToken(SPACE,'    ',5,1,89, False, None)
 27 = {LexToken} LexToken(ID,'LOG_TRACE',5,5,93, False, None)
 28 = {LexToken} LexToken(LPAREN,'(',5,14,102, False, None)
 29 = {LexToken} LexToken(STRING,'"- Thread::operator="',5,15,103, False, None)
 30 = {LexToken} LexToken(RPAREN,')',5,36,124, False, None)
 31 = {LexToken} LexToken(SEMI,';',5,37,125, False, None)
 32 = {LexToken} LexToken(LINEFEED,'\n',5,38,126, False, None)
 33 = {LexToken} LexToken(SPACE,'    ',6,1,127, False, None)
 34 = {LexToken} LexToken(RETURN,'return',6,5,131, False, None)
 35 = {LexToken} LexToken(SPACE,' ',6,11,137, False, None)
 36 = {LexToken} LexToken(TIMES,'*',6,12,138, False, None)
 37 = {LexToken} LexToken(THIS,'this',6,13,139, False, None)
 38 = {LexToken} LexToken(SEMI,';',6,17,143, False, None)
 39 = {LexToken} LexToken(LINEFEED,'\n',6,18,144, False, None)
 40 = {LexToken} LexToken(RBRACE,'}',7,1,145, False, None)
 41 = {LexToken} LexToken(LINEFEED,'\n',7,2,146, False, None)

The problem lies within token[5] above:

lexer.tokenlist[5] = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
 additional = {str} ''
 column = {int} 17
 context = {Context} <nsiqcppstyle_checker.Context object at 0x05046AB0>
 contextStack = {ContextStack} 
 decl = {bool} False
 filename = {str} 'D:\\Junk\\NsiqCppStyle\\operator-2.cpp'
 fullName = {str} 'operator::'
 inactive = {bool} False
 index = {int} 5
 lexer = {Lexer} <nsiqcppstyle_lexer.Lexer object at 0x05044330>
 lexpos = {int} 16
 line = {str} 'Thread& Thread::operator=(Thread &other)'
 lineno = {int} 1
 pp = {NoneType} None
 type = {str} 'FUNCTION'
 value = {str} 'operator'

Although the value ("operator") is correct, the token.fullName value found is "operator::", which is wrong on several counts:

IMHO, an overloaded operator function is just like any other function name. We should not consider the name to be the string "operator" followed by an equal ("=") character, but rather a function whose name just happens to be "operator=".

I think the opposing question should be posed: in what case would a rule writer want the FUNCTION token.fullName to only correspond to "operator", leaving the work of concatenating the operator etc. to the rule writer. I had no need to split the overloaded operator function name - I wanted to know the full name. Should the rule writer want to know the precise operator, he can easily take a slice of the function name, or look at the following token type.