nginx / unit

NGINX Unit - universal web app server - a lightweight and versatile open source server that simplifies the application stack by natively executing application code across eight different programming language runtimes.
https://unit.nginx.org
Apache License 2.0
5.25k stars 320 forks source link

fuzzing: Add fuzzing support and build script #1291

Closed pkillarjun closed 1 week ago

pkillarjun commented 1 month ago

Issue: https://github.com/nginx/unit/issues/1275

Fix: Null-dereference in nxt_http_parse_fuzz

Question: Where is the formatter?

FAQ

pkillarjun commented 1 month ago

Ready for re-review.

ac000 commented 1 month ago

Question: Where is the formatter?

There isn't one! (and I'm not sure it'd be able to handle Units idiosyncrasies)...

ac000 commented 1 month ago

I wonder if src/fuzz should really be something like 'fuzzers/oss-fuzz` ?

pkillarjun commented 1 month ago

I wonder if src/fuzz should really be something like 'fuzzers/oss-fuzz` ?

Not really. The src/fuzz fuzz tests are heavily inspired by the src/test unit tests, so let them be together.

Also, I will squash all these commits after this review.

ac000 commented 4 weeks ago

I wonder if src/fuzz should really be something like 'fuzzers/oss-fuzz` ?

Not really. The src/fuzz fuzz tests are heavily inspired by the src/test unit tests, so let them be together.

Most of the tests, what gets run by CI is under ./test

There also seems to be a lot of binary files in there, which I don't think should be under ./src/...

I guess we may also want to support other fuzzers; trinity, american fuzzy lop etc..., so this whole thing should probably be named after what it is...

pkillarjun commented 4 weeks ago

Most of the tests, what gets run by CI is under ./test

There also seems to be a lot of binary files in there, which I don't think should be under ./src/...

Ah, alright, I will change it to the fuzzer dir.

I guess we may also want to support other fuzzers; trinity, american fuzzy lop etc..., so this whole thing should probably be named after what it is...

LLVMFuzzerTestOneInput can be compiled with afl++ using libAFLDriver.a.

If you want to test it locally with afl++

cd $HOME/Desktop

git clone https://github.com/AFLplusplus/AFLplusplus.git

pushd AFLplusplus/
make
popd

export CC="$HOME/Desktop/AFLplusplus/afl-clang-fast"
export CXX="$HOME/Desktop/AFLplusplus/afl-clang-fast++"
export CFLAGS="-g -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address"
export CXXFLAGS="-g -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address"
export LIB_FUZZING_ENGINE="$HOME/Desktop/AFLplusplus/libAFLDriver.a"

git clone -b fuzzing-update https://github.com/pkillarjun/unit.git

pushd unit/
./configure --no-regex --no-pcre2 --fuzz=$LIB_FUZZING_ENGINE
make fuzz -j$(nproc)
popd

And the same goes for honggfuzz.

Also Trinity and syzkaller are kernel fuzzers.

ac000 commented 4 weeks ago

Is there a description somewhere of what the contents of the corpus files are? E.g what exactly does ./src/fuzz/fuzz_basic_seed_corpus/base64_0.bin contain?

Also how do you analyse the generated crash files?

pkillarjun commented 4 weeks ago

Is there a description somewhere of what the contents of the corpus files are? E.g what exactly does ./src/fuzz/fuzz_basic_seed_corpus/base64_0.bin contain?

The input used in the nxt_base64_test.c test file for nxt_base64_decode.

nxt_conf_json_parse_str in nxt_clone_test.c

Also, I hijacked the nxt_http_parse_request in nxt_http_parse_test.c so it can give me seed files.

pkillarjun commented 4 weeks ago

Also how do you analyse the generated crash files?

./build/fuzz_${fuzzer} ./crash-${hash}

ASAN gives a really nice report.

ac000 commented 3 weeks ago

A couple of general questions...

1) This has pretty limited coverage of Unit, are there plans to increase this? 2) ... which sort of leads onto this next question. Who is intended to maintain this going forward?

pkillarjun commented 3 weeks ago

This has pretty limited coverage of Unit, are there plans to increase this?

Yes, currently I'm using src/test for my fuzzing tests. After that, I will try to find new tests in the source code.

... which sort of leads onto this next question. Who is intended to maintain this going forward?

I can only guarantee, build fixes for API changes or linker errors, and bugs in my harness code that is in the fuzzing directory. I won't be touching or fixing bugs in your codebase, The reason for that is pretty clear and you know it.

Also, I will add you in auto_ccs in https://github.com/google/oss-fuzz/pull/12002 if you don't mind? I believe your email for this GitHub account is andrew@digital-domain.net.

pkillarjun commented 3 weeks ago

Self Note: Before merging, sift all the http_fuzz into different files for better hit coverage.

ac000 commented 3 weeks ago

This has pretty limited coverage of Unit, are there plans to increase this?

Yes, currently I'm using src/test for my fuzzing tests. After that, I will try to find new tests in the source code.

OK...

... which sort of leads onto this next question. Who is intended to maintain this going forward?

I can only guarantee, build fixes for API changes or linker errors, and bugs in my harness code that is in the fuzzing directory. I won't be touching or fixing bugs in your codebase, The reason for that is pretty clear and you know it.

OK, that's what I figured...

Also, I will add you in auto_ccs in google/oss-fuzz#12002 if you don't mind? I believe your email for this GitHub account is andrew@digital-domain.net.

Correct.

ac000 commented 2 weeks ago

So a lot has changed since your initial posting. Could you please rebase against current master and squash down to whatever commits you're planning on merging? and I'll take another pass over it...

pkillarjun commented 2 weeks ago

So a lot has changed since your initial posting. Could you please rebase against current master and squash down to whatever commits you're planning on merging? and I'll take another pass over it...

I think it would be better if this one is merged first: https://github.com/nginx/unit/pull/1295.

My current plan:

ac000 commented 2 weeks ago

I want to merge both at the same time.

I'd like to take another pass over this PR but it's currently too over the place, commits fixing commits fixing commits...

Or perhaps this is telling us that this needs splitting up into three sets of patches (all part of the same PR)...

1) Fuzzing infrastructure 2) Fuzzers 3) Fixes (https://github.com/nginx/unit/pull/1295 would become part of this PR)

pkillarjun commented 2 weeks ago

Or perhaps this is telling us that this needs splitting up into three sets of patches (all part of the same PR)...

  1. Fuzzing infrastructure
  2. Fuzzers
  3. Fixes (http: fix use-of-uninitialized-value bug #1295 would become part of this PR)

I tried to make things simple and easy. You can look at it now.

Also, let's not include https://github.com/nginx/unit/pull/1295 in this PR; it's already too much going on here.

Update: It's working oss-fuzz infra

ac000 commented 2 weeks ago

Hi @pkillarjun

Thanks for the latest updates. It's looking more or less there now. Just some minor things...

1) There is a mixture of spaces and no spaces after casts in the fuzzers, my preference for new code is to have no space, but I can fix that up before I merge, so don't worry about it.

2) The shell scripts are using bash, not a huge problem, are these intended to be run on non-Linux?, e.g the BSDs? where bash may well not be installed by default.

We generally use POSIX sh for our shell scripts for that reason...

3) Some white space issues.

Also the .bin files seem to be in DOS format (notice the ^Ms)

diff --git ./fuzzing/fuzz_http_seed_corpus/nxt_http_test_run_1.bin ./fuzzing/fuzz_http_seed_corpus/nxt_http_test_run_1.bin
new file mode 100644
index 00000000..2f6c6149
--- /dev/null
+++ ./fuzzing/fuzz_http_seed_corpus/nxt_http_test_run_1.bin
@@ -0,0 +1,2 @@
+GEt / HTTP/1.0^M
+^M
$ file ./fuzzing/fuzz_http_seed_corpus/nxt_http_test_bench.bin
./fuzzing/fuzz_http_seed_corpus/nxt_http_test_bench.bin: ASCII text, with CRLF line terminators

Is that a requirement or can we convert them to Unix?

One last thing, this could maybe do with even just a brief README under fuzzing, with a quick start guide (which shell script to run...) and maybe some pre-requisites... it took me a while to figure out I needed to install the compiler-rt package... so on Fedora at least I guess you need the, clang, llvm and compiler-rt packages...

pkillarjun commented 1 week ago

I believe this little patch addresse all your concerns.

Also the .bin files seem to be in DOS format (notice the ^Ms) Is that a requirement or can we convert them to Unix?

Ah, .bin files should not be defined as anything; they are only for fuzzers to consume. I think it would be better if we ignore all .bin files in check-whitespace.yaml

Note: This patch is generated using the OpenAI model gpt-4o.

diff --git a/.github/workflows/check-whitespace.yaml b/.github/workflows/check-whitespace.yaml
index 75f0afe4..946bf442 100644
--- a/.github/workflows/check-whitespace.yaml
+++ b/.github/workflows/check-whitespace.yaml
@@ -29,6 +29,10 @@ jobs:
           "")
             ;;
           *)
+            # Check if the line contains a .bin file and skip it if it does
+            if echo "${dash} ${etc}" | grep -q '\.bin'; then
+              continue
+            fi
             if test -n "${commit}"
             then
               log="${log}\n${commit}"
ac000 commented 1 week ago

I believe this little patch addresse all your concerns.

Heh, OK, I can take it from here (split that up and apply the bits where required...)

Also the .bin files seem to be in DOS format (notice the ^Ms) Is that a requirement or can we convert them to Unix?

Ah, .bin files should not be defined as anything; they are only for fuzzers to consume. I think it would be better if we ignore all .bin files in check-whitespace.yaml

Note: This patch is generated using the OpenAI model gpt-4o.

Heh, if you want them to be kept in DOS format with trailing whietspace, there's no need to hack the whitespace checker!, we can use .gitattributes instead.

I'll add a patch for that.

Just to double check this particular one

fuzzing/fuzz_http_seed_corpus/nxt_http_test_run_12.bin:2: trailing whitespace.
+Host:example.com ^M

Is there meant to be a space character between the hostname and the ^M?

Anyway I'll push up what I'm planning to commit for you to check over and if you can confirm about the above whitespace issue.

ac000 commented 1 week ago

Note I didn't change the shells seeing as this is meant to run under Linux.

1:  6a0ffb45 ! 1:  a183d04d fuzzing: add fuzzing infrastructure in build system
    @@ Commit message
         fuzzing: add fuzzing infrastructure in build system

         Signed-off-by: Arjun <pkillarjun@protonmail.com>
    +    Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
    +    Signed-off-by: Andrew Clayton <a.clayton@nginx.com>

      ## auto/fuzzing (new) ##
     @@
    @@ fuzzing/oss-fuzz.sh (new)
     +
     +zip -r $OUT/fuzz_basic_seed_corpus.zip fuzz_basic_seed_corpus/
     +zip -r $OUT/fuzz_http_controller_seed_corpus.zip  fuzz_http_controller_seed_corpus/
    -+zip -r $OUT/fuzz_http_h1p_seed_corpus.zip  fuzz_http_h1p_seed_corpus/ 
    -+zip -r $OUT/fuzz_http_h1p_peer_seed_corpus.zip  fuzz_http_h1p_peer_seed_corpus/ 
    ++zip -r $OUT/fuzz_http_h1p_seed_corpus.zip  fuzz_http_h1p_seed_corpus/
    ++zip -r $OUT/fuzz_http_h1p_peer_seed_corpus.zip  fuzz_http_h1p_peer_seed_corpus/
     +zip -r $OUT/fuzz_json_seed_corpus.zip fuzz_json_seed_corpus/
     +
     +# Delete temporary directories.
2:  1b4cdcd0 ! 2:  9d017485 fuzzing: add fuzzing targets
    @@ Commit message
         fuzzing: add fuzzing targets

         Signed-off-by: Arjun <pkillarjun@protonmail.com>
    +    Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
    +    Signed-off-by: Andrew Clayton <a.clayton@nginx.com>

      ## fuzzing/nxt_basic_fuzz.c (new) ##
     @@
    @@ fuzzing/nxt_basic_fuzz.c (new)
     +    /*
     +     * Validate base64 data before decoding.
     +     */
    -+    ret = nxt_base64_decode(NULL, (u_char *) data, size);
    ++    ret = nxt_base64_decode(NULL, (u_char *)data, size);
     +    if (ret == NXT_ERROR) {
     +        return;
     +    }
     +
    -+    nxt_base64_decode(buf, (u_char *) data, size);
    ++    nxt_base64_decode(buf, (u_char *)data, size);
     +}
     +
     +
    @@ fuzzing/nxt_json_fuzz.c (new)
     +        return 0;
     +    }
     +
    -+    input.start = (u_char *) data;
    ++    input.start = (u_char *)data;
     +    input.length = size;
     +
     +    conf = nxt_conf_json_parse_str(mp, &input);
3:  1f5b0b1c ! 3:  ce538c49 fuzzing: add a fuzzing seed corpus and dictionary
    @@ Commit message
         fuzzing: add a fuzzing seed corpus and dictionary

         Signed-off-by: Arjun <pkillarjun@protonmail.com>
    +    Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
    +    Signed-off-by: Andrew Clayton <a.clayton@nginx.com>

      ## fuzzing/fuzz_basic_seed_corpus/base64_0.bin (new) ##
      Binary files /dev/null and fuzzing/fuzz_basic_seed_corpus/base64_0.bin differ
4:  022c86e7 < -:  -------- fuzzing: removed white space and spaces, updated build-fuzz.sh to use shell instead of bash, and added a readme
-:  -------- > 4:  b269e7a7 fuzzing: add a basic README
-:  -------- > 5:  3f44c4e9 fuzzing: Add a .gitattributes file
pkillarjun commented 1 week ago

Just to double check this particular one

fuzzing/fuzz_http_seed_corpus/nxt_http_test_run_12.bin:2: trailing whitespace.
+Host:example.com ^M

Is there meant to be a space character between the hostname and the ^M?

nxt_http_test_run_12.bin is an example of malformed data. I also ran afl-cmin on fuzz_http_seed_corpus. All of these seed corpora have unique coverage. My last thought will always be: don't change anything in .bin files at all.

ac000 commented 1 week ago

Rebased with master

$ git range-diff 3f44c4e9...35a572c2
 -:  -------- >  1:  a98acded ci: Add unit testing to unitctl CI workflow
 -:  -------- >  2:  44709bea .mailmap: Add an entry for Ava's GitHub address
 -:  -------- >  3:  30b39bd0 Add GitHub workflows for extra coverage
 -:  -------- >  4:  c30c2f5e Add unitctl quickstart to README.md
 -:  -------- >  5:  4e884d9e tools/unitctl: Replace matching image name to matching command
 -:  -------- >  6:  b91073e5 tools/unitctl: Replace format! with .to_string()
 -:  -------- >  7:  8fc16a77 Packaging: added missing build dependencies to Makefiles
 -:  -------- >  8:  f281207f Packaging: fix build-depends detection on debian-based systems
 -:  -------- >  9:  c38bcee1 contrib: be quiet on unpack
 -:  -------- > 10:  7b19a06c tstr: Constify the 'str' parameter to nxt_tstr_compile()
 -:  -------- > 11:  ea5c41b8 wasm: Add a missing 'const' qualifier in nxt_wasm_setup()
 -:  -------- > 12:  4fc50258 ci: Be more specific when to run the main Unit checks
 -:  -------- > 13:  e77a0c16 Tests: explicitly specify 'f' prefix to format string before printing
 -:  -------- > 14:  c9dced37 Tests: print unit.log on unsuccessful unmount
 -:  -------- > 15:  98983f3f Add a GitHub discussions badge to the README
 -:  -------- > 16:  a7e3686a Tools: improved error handling for unitc
 -:  -------- > 17:  d7ec30c4 ci: Limit when to run checks on pull-requests
 -:  -------- > 18:  04a24f61 http: fix use-of-uninitialized-value bug
 1:  a183d04d = 19:  965fc94e fuzzing: add fuzzing infrastructure in build system
 2:  9d017485 = 20:  a93d878e fuzzing: add fuzzing targets
 3:  ce538c49 = 21:  665353dc fuzzing: add a fuzzing seed corpus and dictionary
 4:  b269e7a7 = 22:  5b65134c fuzzing: add a basic README
 5:  3f44c4e9 = 23:  35a572c2 fuzzing: Add a .gitattributes file