llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.17k stars 12.04k forks source link

Experiment with a -Windent warning #19312

Closed nico closed 4 years ago

nico commented 10 years ago
Bugzilla Link 18938
Resolution FIXED
Resolved on Feb 20, 2020 08:10
Version trunk
OS All
Attachments prototype patch v0, prototype patch v0.1
CC @akyrtzi,@davidbolvansky,@majnemer,@zmodem,@belkadan,@seanm

Extended Description

Wouldn't it be great if clang warned on something like

if (5 != 0) goto fail; goto fail;

?

This bug tracks various ideas for warnings about this based on indent level. (There's already -Wunreachable-code for the CFG approach).

Version 1: Remember only child of unbraced ifs, fors, whiles, and warn if next statement has same indent as that child. Prototype patch attached (it doesn't reset IndentStmt often enough).

It has false positives on this code in udis86.c in mach_override:

if (ud_input_end(u)) return 0;

u->asm_buf[0] = 0;

if (ud_decode(u) == 0) return 0;

Also on this from mach_override.c, but looking at the style in the rest of the file that might actually be a (harmless) true positive:

if( !err )  
    err = allocateBranchIsland( &escapeIsland, kAllocateHigh, originalFunctionAddress );
    if (err) fprintf(stderr, "err = %x %s:%d\n", err, __FILE__, __LINE__);
nico commented 4 years ago

-Wmisleading-indentation will be in clang 10.0. Thanks, Tyker!

nico commented 4 years ago

It's happening: https://reviews.llvm.org/D70638 \o/

seanm commented 5 years ago

These days, clang-tidy has a check that seems to at least partially overlap what we're talking about here:

https://clang.llvm.org/extra/clang-tidy/checks/readability-misleading-indentation.html

nico commented 5 years ago

The 4th attachment is the output of my old wip patch on Chromium's "ui" target.

It flags this indent: https://chromium.googlesource.com/chromium/src/+/71c71a677ae0a8e0004a561f6d558b33ae9c1af3/third_party/libpng/pngread.c#122 (which is correct in that the indent is off, but it's not a behavioral bug; and it's fixed in libpng trunk)

The same is true here: https://chromium.googlesource.com/chromium/src/+/71c71a677ae0a8e0004a561f6d558b33ae9c1af3/third_party/libpng/pngpread.c#1104

https://chromium.googlesource.com/chromium/src/+/71c71a677ae0a8e0004a561f6d558b33ae9c1af3/third_party/libpng/pngwutil.c#375 kind of looks like a real (but also benign) bug:

if (comp->max_output_ptr != 0) png_free(png_ptr, comp->output_ptr); comp->output_ptr=NULL;

This code looks pretty different at trunk libpng, so it might be interesting to see what the motivation for the change was.

nspr is just weird indent again: https://github.com/servo/nspr/blob/master/pr/src/misc/prsystem.c#L106

https://chromium.googlesource.com/chromium/deps/icu.git/+/4b5a7240d2dfa0cb5627f20a26ceffd67930d134/source/i18n/rbnf.cpp#325 is arguably a bug in the warning approach. But it looks like ICU decided that formatting was too clever in https://chromium.googlesource.com/chromium/deps/icu.git/+/b31896655a701874d13e70cc24bca95e3e66a991%5E%21/source/i18n/rbnf.cpp

https://chromium.googlesource.com/chromium/deps/icu.git/+/0f8746ae6655c753514c329dfbe044e00e36a1b3/source/i18n/dtfmtsym.cpp#912 is the "incorrect indent, but harmless, but still good to fix" thing again.

...and so on. If you're interested in looking at this, I could re-run the check and post what it finds in Chromium. (If you're coming to LLVM conf, we could co-hack on this.)

The approach in described in comment 0 only does actual work after "few" AST nodes, so maybe it's not super expensive. It does need to track indent more often though. I don't remember if I had run any benchmarks.

davidbolvansky commented 5 years ago

Any examples of true positives other than famous goto fail;?

GCC tracks a lot of line data with -Wmisleading-indentation so probably this can slow down a compiler a lot. But we could do it heuristic-based based on typical true positives -> faster, less false positives.

seanm commented 8 years ago

GCC 6 will have this feature (-Wmisleading-indentation), would be nice if clang did too...

nico commented 9 years ago

small manual test file

seanm commented 10 years ago

I just did an svn up and it resulted in a conflict:

<<<<<<< .mine unsigned getColumnNumber(FileID FID, unsigned FilePos, bool Invalid = 0, int NumTabs = 0, int NumSpaces = 0) const; unsigned getSpellingColumnNumber(SourceLocation Loc, bool Invalid = 0) const;

unsigned getColumnNumber(FileID FID, unsigned FilePos, bool Invalid = nullptr) const; unsigned getSpellingColumnNumber(SourceLocation Loc, bool Invalid = nullptr) const;

.r208340

So you might want to update your patch Nico.

IMHO, having this committed would be great. :)

seanm commented 10 years ago

So I've turned on -Windent for some of my open source buildbots:

GDCM: http://open.cdash.org/viewBuildError.php?type=1&buildid=3278038 3 warnings: 2 definitely right to warn, 1 false positive I guess (but can easily add braces):

  if( mult == 2 && genidx == -1 )
    for(int i = 0; i < Internal->NumberOfPoints; i++ )
      array[3*i+1] = p[i + 1];
  else if( mult == 2 && genidx == 0 ) //**** warns here
    for(int i = 0; i < Internal->NumberOfPoints; i++ )
      array[3*i+1] = p[i + 0];
  else
    for(int i = 0; i < Internal->NumberOfPoints; i++ )
      array[3*i+1] = 0;

VTK: http://open.cdash.org/viewBuildError.php?type=1&buildid=3277751 3 warning: all right to warn on

CMake: no warnings

nico commented 10 years ago

I haven't found a great way to deal with the preprocessor issue described in comment 18. One possible avenue would be to just suppress the warning if the indented line is more than 3 lines below the end of the if/for/while.

(Why 3? because

if foo

stuff

endif

is 3 lines long)

The obvious disadvantage of this is that it might skip valid statements due to comment lines.

seanm commented 10 years ago

I can confirm that 'prototype patch v2.3' applies against current trunk, builds, and warns in trivial cases... now to try on some code I use...

nico commented 10 years ago

prototype patch v2.3 rebased

seanm commented 10 years ago

I'd like to give this a try, but it seems the latest patch no longer applies against trunk… think you could update the patch? Thanks.

nico commented 10 years ago

Re majnemer comment 19: This warning wouldn't catch that problem, since this warning only fires on unbraced ifs.

belkadan commented 10 years ago

See also bug 7555.

991901f3-cc14-4404-b340-165691b62a58 commented 10 years ago

Would the following be considered under the aegis of -Wpadded?

if (something) { foo(); } if (something_else) { bar(); }

The bug in this code was that the programmer intended on writing "else if".

nico commented 10 years ago

Examples that 2.2 warns on where it's "wrong", both from libavformat:

        if(!st->codec->has_b_frames)
            pktl->pkt.pts= cur_dts;

// if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) pktl->pkt.duration = duration;

(don't check in commented-out code, duh)

if (authorization && authorization[0])
    av_strlcatf(str, size, "%s@", authorization);

if CONFIG_NETWORK && defined(AF_INET6)

/* Determine if hostname is a numerical IPv6 address,
 * properly escape it within [] in that case. */
hints.ai_flags = AI_NUMERICHOST;
if (!getaddrinfo(hostname, NULL, &hints, &ai)) {
  ...
} else

endif

    /* Not an IPv6 address, just output the plain string. */
    av_strlcat(str, hostname, size);

I don't know if the parser has a way to ask the preprocessor if anything was ifdef'd out between the current token and the last statement, else that should probably be done too.

nico commented 10 years ago

prototype patch v2.2 This ignores the warning if if/for/while and following element are on the same line.

This does pretty well -- everything it fires for that I looked at is at least confusing, for example this from Harfbuzz:

if (unlikely (!c->len))  return TRACE_RETURN (false);
if (!digest->may_have (c->glyphs[0]))  return TRACE_RETURN (false);
  return TRACE_RETURN (dispatch (c));

...but while it finds quite a bit of questionable indentation, it hasn't found a "real" bug yet.

nico commented 10 years ago

Output for chromium's "ui" target Patch v2.1 does pretty well, it only finds true positives as far as I can tell – but it finds quite a few, and most of them harmless. (You can use http://cs.chromium.org to look at all these files.)

For example, in icu some code accidentally got indented a column too much:

    case NARROW :
       if (fStandaloneNarrowMonths)
            delete[] fStandaloneNarrowMonths;
        fStandaloneNarrowMonths = newUnicodeStringArray(count);
        uprv_arrayCopy( monthsArray,fStandaloneNarrowMonths,count);
        fStandaloneNarrowMonthsCount = count;
        break; 

Same problem in libpng's pngread.c:

if (user_png_ver) { i = 0; do { if (user_png_ver[i] != png_libpng_ver[i]) png_ptr->flags |= PNG_FLAG_LIBRARY_MISMATCH; } while (png_libpng_ver[i++]); } else png_ptr->flags |= PNG_FLAG_LIBRARY_MISMATCH;

if (png_ptr->flags & PNG_FLAG_LIBRARY_MISMATCH)
{
   ...
}

(Note that everything is indented 1 relative to the first if, which is actually kind of confusing to read.)

…and I suppose it's necessary to suppress the warning if both statements are on the same line, as stuff like

    while (*list && *list != c) ++list; return *list == c;

(this one from icu) is pretty common.

nico commented 10 years ago

Some manual test cases, not exhaustive

nico commented 10 years ago

prototype patch v2.1 Now resets IndentStmt in more places so that this doesn't warn on

~scopedrefptr() { if (ptr) ptr_->Release(); }

T* get() const { return ptr_; }

nico commented 10 years ago

prototype patch v2 Now doesn't warn on mixed tabs and spaces.

Another type of false positive:

define poll_check_ok(pop)

if (!(ev->ev_events & (EV_READ|EV_WRITE)))
    return (0);

poll_check_ok(pop);

The semicolon is more indented than the if due to the empty macro. Maybe suppress warning for empty statements?

nico commented 10 years ago

From libpng's png_write_compressed_data_out, might actually be a bug:

if (comp->max_output_ptr != 0) png_free(png_ptr, comp->output_ptr); comp->output_ptr=NULL;

nico commented 10 years ago

From skia:

void pop(T elem) { if (elem) elem = (*this)[fCount - 1]; --fCount; }

Not sure if the warning should be suppressed if the thing after the if is on the same line as the if as that is kind of hard to read for humans too.

nico commented 10 years ago

False positive on re2c generated code:

line 213 "scanner.re"

{ if(cursor == s->eof) Scanner_fatal(s, "missing '}'"); s->pos = cursor; s->cline++; goto code; }

Also has mixed spaces / tabs, so ignoring lines lines like that should probably be the next change.

nico commented 10 years ago

"False" positive on sfntly's font_data.cc:

bool FontData::Bound(int32_t offset) { if (offset > Size() || offset < 0) return false;

boundoffset += offset; return true; }

nico commented 10 years ago

prototype patch v1 This is with the modified warning approach described above.

This triggers on

    if (reply)
        req->user_callback(DNS_ERR_NONE, DNS_IPv6_AAAA,
                           reply->data.aaaa.addrcount, ttl,
                           reply->data.aaaa.addresses,
                           req->user_pointer);
    else
        req->user_callback(err, 0, 0, 0, NULL, req->user_pointer);
            return;

in libevent -- note mixed spaces and tabs again (at least not on the same line). Probably want to disable the warning if the whitespace before the if/for/while and the line that's warned on contain a mix of tabs and spaces.

nico commented 10 years ago

Similar, also from nss:

if (mp_cmp_z(&f) == 0) { res = MP_UNDEF; } else for (;;) { ... } stmt;

nico commented 10 years ago

Another false positive, from nss's ssl3con.c:

if (dh_g.len > dh_p.len || !ssl3_BigIntGreaterThanOne(&dh_g))
    goto alert_loser;
    rv = ssl3_ConsumeHandshakeVariable(ss, &dh_Ys, 2, &b, &length);

(Note mixed tabs and spaces on the same line 9_9)

Also sometimes seen in nss:

if (foo()) do ... while (...); stmt;

(would be fixed by checking if thing-after-if has higher indent than if)

nico commented 10 years ago

prototype patch v0.3 (reset IndentStmt at the start of functions too)

Another false positive, from nss's sslcon.c: if (certLen + responseLen + SSL_HL_CLIENT_CERTIFICATE_HBYTES

ss->gs.recordLen) { / prevent overflow crash. / rv = SECFailure; } else rv = ssl2_HandleClientCertificate(ss, data[1], data + SSL_HL_CLIENT_CERTIFICATE_HBYTES, certLen, data + SSL_HL_CLIENT_CERTIFICATE_HBYTES + certLen, responseLen); if (rv) { rv2 = ssl2_SendErrorMessage(ss, SSL_PE_BAD_CERTIFICATE); SET_ERROR_CODE goto loser; }

Looks like not indenting the body of an if or else is pretty common. Next approach is to check if the thing after an if/else/for/while has a higher indent than the if/else/for/while itself and warning on that. That'd fix most of the false positives above.

nico commented 10 years ago

More false positives:

From libpng's pngset.c:

if (png_ptr != NULL)
png_ptr->asm_flags = 0;
asm_flags = asm_flags; /* Quiet the compiler */

Lots of stuff similar to this in libpng's pngget.c:

if (png_ptr != NULL && info_ptr != NULL)

ifdef PNG_oFFs_SUPPORTED

if (info_ptr->valid & PNG_INFO_oFFs) { png_debug1(1, "in %s retrieval function", "png_get_y_offset_microns");

  if (info_ptr->offset_unit_type != PNG_OFFSET_MICROMETER)
      return (0);

  else
      return (info_ptr->y_offset);

}

else

return (0);

endif

return (0);

From nspr's prsystem.c:

    if (cmd == PR_SI_HOSTNAME_UNTRUNCATED)
        break;
    /*
     * On some platforms a system does not have a hostname and
     * its IP address is returned instead.   The following code
     * should be skipped on those platforms.
     */

ifndef _PR_GET_HOST_ADDR_AS_NAME

    /* Return the unqualified hostname */
        while (buf[len] && (len < buflen)) {
            if (buf[len] == '.') {
                buf[len] = '\0';
                break;
            }
            len += 1;
        }    

endif

True (harmless) positive from yasm's code.c:

for(i = 0; i < g->nSpans; ++i)
    if(g->span[i].to != def)
    *(t++) = &g->span[i];

    if (dFlag)
    fputs("\tYYDEBUG(-1, yych);\n", o);
nico commented 10 years ago

prototype patch v0.2 With silly hack to not warn on nested if-elses like so:

if (1) { if (1) goto fail; else if (1) goto fail; else if (1) goto fail; else goto fail; } else if (1) { if (1) goto fail; }

(probably want to have some kind of stack in a real patch)

nico commented 10 years ago

Also incorrectly warns about this from dtoa.cc (Honor_FLT_ROUNDS is not defined):

        if (bc.dsign)
            break;  /* Must use bigcomp(). */

ifdef Honor_FLT_ROUNDS

        if (bc.rounding != 1) {
            if (i < 0)
                break;
            }
        else

endif

            {
            bc.nd = nd;
            i = -1; /* Discarded digits make delta smaller. */
            }