soasme / PeppaPEG

PEG Parser in ANSI C
https://soasme.com/PeppaPEG
MIT License
55 stars 7 forks source link

Crash on illegal characters #48

Closed retikulum closed 3 years ago

retikulum commented 3 years ago

Hi,

While I was fuzzing library, I encountered some bugs. You can find them in below.

What is my program?

#include <stdio.h>
#include "/mnt/ramdisk/PeppaPEG/peppapeg.h"
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>
# define ENTRY 1

int main(int argc, char* argv[]) {
    FILE *in_file  = fopen(argv[1], "r");
    char data[255];
    fscanf(in_file, "%[^\n]", data);
    P4_Grammar* grammar = P4_CreateGrammar();
    if (grammar == NULL) {
        return 1;
    }

    if (P4_AddLiteral(grammar, ENTRY, data, false) != P4_Ok) {
        return 1;
    }

    P4_Source*  source = P4_CreateSource(data, ENTRY);
    if (source == NULL) {
        return 1;
    }

    if (P4_Parse(grammar, source) != P4_Ok) {
        return 1;
    }

    P4_Token*   root = P4_GetSourceAst(source);
    char*       text = P4_CopyTokenString(root);

    free(text);
    P4_DeleteSource(source);
    P4_DeleteGrammar(grammar);
    return 1;

}

Ubuntu Version

Ubuntu 18.04.5 LTS
Linux ubuntu 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Bug

Some special characters can't be parsed and it caused memory (heap-bufffer-overflow) errors.

You can produce it with string "Hello Worìd" (ì = 0xEC or 236). I patched P4_CaseCmpInsensitive function to show what is the value of P4_String given to P4_ReadRune as an argument. It can be seen that it is stopped after ì because it is parsing variable wrong and assign wrong number to P4_Rune* c . With the use of wrong value, program will access memory areas where it shouldn’t be. I am aware of open issue (https://github.com/soasme/PeppaPEG/issues/38) for supporting other utf encodings.

image

image

image

If input is a long string which contains “illegal” characters, program will be aboreted and throws DEADLYSIGNAL. This bug can be produced without sanitizers. Inpus can be found here: https://controlc.com/74f1e9b9

image

image

AFL++ (https://github.com/AFLplusplus/AFLplusplus) is used for this fuzzing campaign.

soasme commented 3 years ago

@retikulum I have fixed the issue by revamping the implementation of P4_CaseCmpInsensitive. The case-insensitive literal rule "Hello Worìd" can now match the input "HELLO WORÌD", see test.

The case https://controlc.com/74f1e9b9 can pass on my local. Can you please revisit this issue when you're free?

Also, I think introducing fuzzy testing into this project is definitely a must-do. I have added it to the roadmap.

retikulum commented 3 years ago

@soasme Thanks for your great effort. I have tested both cases. "Hello Worìd" works perfectly on my machine however the second case it is still failing. I think I shouldn't have copy-pasted test case on controlc because it generally brokes for non-ascii. I have tested second case both with sanitizers and without sanitizers. It again crashes. My crash_test case is attached.

image

crash_test.txt

For the introducing fuzzing test into project, you can consider integrating it with oss-fuzz (I know that it can take time to integrate it for small projects) or you can use similar script that I use for this campaign. It is inspired by afl++ installation script. I haven't tested it, I just wrote it for giving brief explanation to you.

#install afl++
sudo apt-get update
sudo apt-get install -y build-essential python3-dev automake git flex bison libglib2.0-dev libpixman-1-dev python3-setuptools
# try to install llvm 11 and install the distro default if that fails
sudo apt-get install -y lld-11 llvm-11 llvm-11-dev clang-11 || sudo apt-get install -y lld llvm llvm-dev clang 
sudo apt-get install -y gcc-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-plugin-dev libstdc++-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-dev
git clone https://github.com/AFLplusplus/AFLplusplus && cd AFLplusplus
make distrib
sudo make install
#change directory to source
cd ..
#you can change harness according to which function/s you want to fuzz
afl-clang-fast harness.c peppapeg.c -o harness 
#create one small corpus
mkdir inputs
echo "Hello World" > inputs/rand
sudo afl-system-config
mkdir sync_dir
#start afl -- you can change/play parameters for specific scenarios
afl-fuzz -i inputs -o sync_dir -- ./harness @@
soasme commented 3 years ago

Thanks for the procedure. I'll take it a look.

The exact character that caused the problem is 00 as it is used by C to determine the end of the input string. Currently, Peppa PEG only supports UTF-8 encoding; 00 is supposed not to be appearing in the middle of the content. I think you probably need to sanitize the input string by replacing 00 to something else like whitespace or \uFFFD. Similar assumption can be found in commonmark spec.

retikulum commented 3 years ago

@soasme Hi again. This will help me a lot to gain better understanding of how these encoding schemes work. You can close the issue as you wish. Thank you for your time and effort.