python / cpython

The Python programming language
https://www.python.org
Other
62.44k stars 29.97k forks source link

Large number of Coverity reports for parser.c #85231

Closed gpshead closed 4 years ago

gpshead commented 4 years ago
BPO 41059
Nosy @gvanrossum, @gpshead, @tiran, @pganssle, @lysnikolaou, @pablogsal

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/gvanrossum' closed_at = created_at = labels = ['interpreter-core', 'deferred-blocker', '3.9', '3.10'] title = 'Large number of Coverity reports for parser.c' updated_at = user = 'https://github.com/gpshead' ``` bugs.python.org fields: ```python activity = actor = 'gvanrossum' assignee = 'gvanrossum' closed = True closed_date = closer = 'gvanrossum' components = ['Interpreter Core'] creation = creator = 'gregory.p.smith' dependencies = [] files = [] hgrepos = [] issue_num = 41059 keywords = [] message_count = 4.0 messages = ['371956', '371962', '374419', '376046'] nosy_count = 6.0 nosy_names = ['gvanrossum', 'gregory.p.smith', 'christian.heimes', 'p-ganssle', 'lys.nikolaou', 'pablogsal'] pr_nums = [] priority = 'deferred blocker' resolution = 'works for me' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue41059' versions = ['Python 3.9', 'Python 3.10'] ```

gpshead commented 4 years ago

Here's an example:

*** CID 1464688:  Control flow issues  (DEADCODE)
/Parser/parser.c: 24243 in _tmp_147_rule()
24237                 &&
24238                 (z = disjunction_rule(p))  // disjunction
24239             )
24240             {
24241                 D(fprintf(stderr, "%*c+ _tmp_147[%d-%d]: %s succeeded!\n", p->level, ' ', _mark, p->mark, "'if' disjunction"));
24242                 _res = z;
>>>     CID 1464688:  Control flow issues  (DEADCODE)
>>>     Execution cannot reach the expression "PyErr_Occurred()" inside this statement: "if (_res == NULL && PyErr_O...".
24243                 if (_res == NULL && PyErr_Occurred()) {
24244                     p->error_indicator = 1;
24245                     D(p->level--);
24246                     return NULL;
24247                 }
24248                 goto done;

A lot of them are of that form, which seems harmless if they are true - it means a compiler may deduce the same thing and omit code generation for an impossible to trigger error block. OTOH this could just be a weakness in the scanner. (i don't know how to silence it via markers in the code, but i assume it is possible)

You'll need to login to Coverity to see the full report. https://scan.coverity.com/projects/python?tab=overview. (it has been ages since i've logged in, they appear to support Github logins now. yay.)

As the parser.c code is new for 3.9, I'm marking this as deferred blocker. We should pay closer attention to the reports and update the parser generator code to generate code that passes analysis cleanly before we exit the beta phase.

gvanrossum commented 4 years ago

Good catch!

Are all the coverity complaints about this kind of code?

if (_res == NULL && PyErr_Occurred()) {
    ...
}

(Mostly for Pablo and Lysandros:) This comes from emit_action(). It is preceded by a line of the form

_res = \<action>;

Most of the time the action is a function call, so this looks like

_res = \<some function call which can really fail>;

But occasionally the action is just pulling a named item from the alternative that was just recognized, and those variables are generally known to be non-NULL (because otherwise the alternative would fail).

There seem to be a whole bunch of actions of the form { \<variable> }. A typical example:

else_block[asdl_seq*]: 'else' ':' b=block { b }

We could probably recognize actions that take the form of a single variable and suppress the error check for the result.

However, there's a wrinkle. Sometimes a variable may legitimately be NULL when an alternative is recognized! This could occur if the name refers to an optional item. I even found an example:

| 'class' a=NAME b=['(' z=[arguments] ')' { z }] ':' c=block {

Look closely at the part starting with b=:

                     b=['(' z=[arguments] ')' { z }]

In the sub-rule we see z=[arguments] and the action is { z } so it is possible that we reach the _res = z part while z is NULL. In my copy of parser.c (current master) this is in _tmp_69_rule().

IMO this makes it difficult to omit the error check in exactly the right situations. Moreover, isn't it the job of the compiler to optimize code so we don't have to? The code generator is complex enough as it is; I would prefer not to have to complexificate it just so Coverity won't complain about unreachable code (no matter how right it is!).

I know nothing about Coverity any more. Is it possible to exempt this giant generated file from warnings about unreachable code?

pablogsal commented 4 years ago

I tried reprodicing the coverity results using cppcheck (I don't have access to the coverity project, someone needs to approve :() and I got just a dozen of results:

❯ cppcheck Parser/parser.c --enable=style
Checking Parser/parser.c ...
Parser/parser.c:3280:22: style: Condition '_res==NULL' is always false [knownConditionTrueFalse]
            if (_res == NULL && PyErr_Occurred()) {
                     ^
Parser/parser.c:3275:16: note: Assuming that condition 'a=_gather_33_rule(p)' is not redundant
            (a = _gather_33_rule(p))  // ','.import_from_as_name+
               ^
Parser/parser.c:3279:20: note: Assignment '_res=a', assigned value is 0
            _res = a;
                   ^
Parser/parser.c:3280:22: note: Condition '_res==NULL' is always false
            if (_res == NULL && PyErr_Occurred()) {
                     ^
Parser/parser.c:3365:22: style: Condition '_res==NULL' is always false [knownConditionTrueFalse]
            if (_res == NULL && PyErr_Occurred()) {
                     ^
Parser/parser.c:3360:16: note: Assuming that condition 'a=_gather_36_rule(p)' is not redundant
            (a = _gather_36_rule(p))  // ','.dotted_as_name+
               ^
Parser/parser.c:3364:20: note: Assignment '_res=a', assigned value is 0
            _res = a;
                   ^
Parser/parser.c:3365:22: note: Condition '_res==NULL' is always false
            if (_res == NULL && PyErr_Occurred()) {
                     ^
Parser/parser.c:6072:22: style: Condition '_res==NULL' is always false [knownConditionTrueFalse]
            if (_res == NULL && PyErr_Occurred()) {
                     ^
Parser/parser.c:6067:16: note: Assuming that condition 'a=_loop1_68_rule(p)' is not redundant
            (a = _loop1_68_rule(p))  // (('@' named_expression NEWLINE))+
               ^
Parser/parser.c:6071:20: note: Assignment '_res=a', assigned value is 0
            _res = a;
                   ^
Parser/parser.c:6072:22: note: Condition '_res==NULL' is always false
            if (_res == NULL && PyErr_Occurred()) {
                     ^
Parser/parser.c:10662:22: style: Condition '_res==NULL' is always false [knownConditionTrueFalse]
            if (_res == NULL && PyErr_Occurred()) {
                     ^
Parser/parser.c:10657:16: note: Assuming that condition 'a=expression_rule(p)' is not redundant
            (a = expression_rule(p))  // expression
               ^
Parser/parser.c:10661:20: note: Assignment '_res=a', assigned value is 0
            _res = a;
                   ^
Parser/parser.c:10662:22: note: Condition '_res==NULL' is always false
            if (_res == NULL && PyErr_Occurred()) {
lysnikolaou commented 4 years ago

I propose that we take no further action and close this.