octopus-platform / joern

A robust parser for C/C++ storing abstract syntax trees, control flow graphs and program dependence graphs in a neo4j graph database.
http://mlsec.org/joern
GNU Lesser General Public License v3.0
546 stars 140 forks source link

Parser fails on convoluted expressions #59

Open matneu opened 9 years ago

matneu commented 9 years ago

The parser fails on some valid (but convoluted) expressions such as:

int res = ((int (*)())(*(&function_table[atoi(argv[1])])))(atoi(argv[2]), atoi(argv[3]));

It ends up creating multiple statement nodes that will just contain one-character pieces of the whole expression. Also, it does not report an error.

Full example:

#include <stdio.h>
int sum(int a, int b) {return a+b;}
int mul(int a, int b) {return a*b;}
int main(int argc, char **argv) {
    int (*function_table[2])(int a, int b);
    function_table[0] = sum;
    function_table[1] = mul;
    int res = ((int (*)())(*(&function_table[atoi(argv[1])])))(atoi(argv[2]), atoi(argv[3]));
    printf("result %d\n", res);
    return 0;
}
fabsx00 commented 9 years ago

Thanks for the report! When joern does not fully recognize a statement as such, it outputs just the tokens and reports no error. In terms of error handling, this is what we want from a fuzzy parser. However, we should tune the parser to be able to recognize statements like these as they are obviously valid, so thanks for the report.

My guess would be that it fails to recognize the "((int (*)())" cast. Could you do me a favour and check whether

int res = ((int (*)()) foo;

is already unrecognized? fabs

matneu commented 9 years ago

Thanks for the fast reply! Just tried it,

int res = ((int (*)()) foo;

is already unrecognized.

fabsx00 commented 9 years ago

ok, thanks, confirms what I expected. We'll include a fix for this in the next release along with fixes for the other problems reported in the last couple of months.