llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.83k stars 11.46k forks source link

Does Clang Static Analyzer have support for STL in C++ #60159

Open VenoVeno opened 1 year ago

VenoVeno commented 1 year ago

Clang Static Analyzer failed to detect bugs in STL's like List, Map.

To Reproduce

CodeChecker check --print-steps --build "make all" --output Report --clean --enable alpha.security.taint.TaintPropagation --ctu --enable profile:sensitive

Expected behaviour Taint Analysis should capture List

Desktop (please complete the following information)

Sample Code

#include <list>
{
    size_t data;
    list<size_t> dataList;
    /* Initialize data */
    data = 0;
    {
        char inputBuffer[CHAR_ARRAY_SIZE] = "";
        /* POTENTIAL FLAW: Read data from the console using fgets() */
        if (fgets(inputBuffer, CHAR_ARRAY_SIZE, stdin) != NULL)
        {
            /* Convert to unsigned int */
            data = strtoul(inputBuffer, NULL, 0);
        }
        else
        {
            printLine("fgets() failed.");
        }
    }
    /* Put data in a list */
    dataList.push_back(data);
    // dataList.push_back(data);
    // dataList.push_back(data);
    badSink(dataList);
}

void badSink(list<size_t> dataList)
{
    /* copy data out of dataList */
    size_t data = dataList.back();
    {
        char * myString;
        /* POTENTIAL FLAW: No MAXIMUM limitation for memory allocation, but ensure data is large enough
         * for the strcpy() function to not cause a buffer overflow */
        /* INCIDENTAL FLAW: The source could cause a type overrun in data or in the memory allocation */
        if (data > strlen(HELLO_STRING))
        {
            myString = (char *)malloc(data*sizeof(char));
            if (myString == NULL) {exit(-1);}
            /* Copy a small string into myString */
            strcpy(myString, HELLO_STRING);
            printLine(myString);
            free(myString);
        }
        else
        {
            printLine("Input is less than the length of the source string");
        }
    }
}

Comment Extended:

Originally posted by @bruntib in https://github.com/Ericsson/codechecker/issues/3798#issuecomment-1397281431

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-static-analyzer

VenoVeno commented 1 year ago

@EugeneZelenko Any insight on this?

haoNoQ commented 1 year ago

As a general rule, it doesn't have specialized models for STL data structures. It does support them in the same way that it does support arbitrary code. Such general behavior can often be improved upon with specialized "models" of STL objects and methods. There's no fundamental reason why we can't have such specialized models. And we do have them for a few classes. But they take a lot of time to implement, and for most classes we didn't have time to implement them yet.

A quick look at your example suggests that indeed, the static analyzer loses track of taint information at push_back, as it doesn't try to figure out what happens if this function is called, but goes for a conservative approximation instead. If forced to step into the function, it appears to analyze at least 1000 nested function calls on your example's primary path (if compiled against LLVM's libc++ on my machine; the results may be vastly different for GNU libstdc++). This is not only because your path actually expands to that many nested STL function calls, but also because some of these calls are forking the path further, causing exponential explosion in analysis complexity. A realistic source of such forks would be checks such as "is the next element null?", "is the previous element null?", where the static analyzer can't infer the consistency rules from the code (such as "previous element of next element is always equal to the original element") so it has to "assume" that various things can happen.

So this is a pretty good indication that it's for the best that we don't try to reason about this code in STL headers, and that there's no easy way out of this problem other than actually delivering a custom model for std::list.

VenoVeno commented 1 year ago

Hi @haoNoQ. Thanks for your reply.

Instead how to mark the function as propagator in custom config. As of my knowing we can provide only function names in custom taint config. Correct me if i'm wrong.