Closed retikulum closed 3 years ago
@retikulum I have fixed the issue by revamping the implementation of P4_CaseCmpInsensitive
. The case-insensitive literal rule "Hello Worìd" can now match the input "HELLO WORÌD", see test.
The case https://controlc.com/74f1e9b9 can pass on my local. Can you please revisit this issue when you're free?
Also, I think introducing fuzzy testing into this project is definitely a must-do. I have added it to the roadmap.
@soasme Thanks for your great effort. I have tested both cases. "Hello Worìd" works perfectly on my machine however the second case it is still failing. I think I shouldn't have copy-pasted test case on controlc because it generally brokes for non-ascii. I have tested second case both with sanitizers and without sanitizers. It again crashes. My crash_test case is attached.
For the introducing fuzzing test into project, you can consider integrating it with oss-fuzz (I know that it can take time to integrate it for small projects) or you can use similar script that I use for this campaign. It is inspired by afl++ installation script. I haven't tested it, I just wrote it for giving brief explanation to you.
#install afl++
sudo apt-get update
sudo apt-get install -y build-essential python3-dev automake git flex bison libglib2.0-dev libpixman-1-dev python3-setuptools
# try to install llvm 11 and install the distro default if that fails
sudo apt-get install -y lld-11 llvm-11 llvm-11-dev clang-11 || sudo apt-get install -y lld llvm llvm-dev clang
sudo apt-get install -y gcc-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-plugin-dev libstdc++-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-dev
git clone https://github.com/AFLplusplus/AFLplusplus && cd AFLplusplus
make distrib
sudo make install
#change directory to source
cd ..
#you can change harness according to which function/s you want to fuzz
afl-clang-fast harness.c peppapeg.c -o harness
#create one small corpus
mkdir inputs
echo "Hello World" > inputs/rand
sudo afl-system-config
mkdir sync_dir
#start afl -- you can change/play parameters for specific scenarios
afl-fuzz -i inputs -o sync_dir -- ./harness @@
Thanks for the procedure. I'll take it a look.
The exact character that caused the problem is 00
as it is used by C to determine the end of the input string. Currently, Peppa PEG only supports UTF-8 encoding; 00
is supposed not to be appearing in the middle of the content. I think you probably need to sanitize the input string by replacing 00
to something else like whitespace or \uFFFD. Similar assumption can be found in commonmark spec.
@soasme Hi again. This will help me a lot to gain better understanding of how these encoding schemes work. You can close the issue as you wish. Thank you for your time and effort.
Hi,
While I was fuzzing library, I encountered some bugs. You can find them in below.
What is my program?
Ubuntu Version
Bug
Some special characters can't be parsed and it caused memory (heap-bufffer-overflow) errors.
You can produce it with string "Hello Worìd" (ì = 0xEC or 236). I patched P4_CaseCmpInsensitive function to show what is the value of P4_String given to P4_ReadRune as an argument. It can be seen that it is stopped after ì because it is parsing variable wrong and assign wrong number to P4_Rune* c . With the use of wrong value, program will access memory areas where it shouldn’t be. I am aware of open issue (https://github.com/soasme/PeppaPEG/issues/38) for supporting other utf encodings.
If input is a long string which contains “illegal” characters, program will be aboreted and throws DEADLYSIGNAL. This bug can be produced without sanitizers. Inpus can be found here: https://controlc.com/74f1e9b9
AFL++ (https://github.com/AFLplusplus/AFLplusplus) is used for this fuzzing campaign.