wgslfuzz / darthshader

Apache License 2.0

21 stars 6 forks source link

Tips for getting this running #1

Open agoodm88 opened 1 month ago

agoodm88 commented 1 month ago

Do you have any tips for getting this running? I've got a very large and well developed corpus of wgsl shader files from a previous fuzzing run which I would like to try with this fuzzer. I am attempting to get this running on Ubuntu 24.04.1. Rust nightly toolchain but could use any version of linux which will run on bare metal on HP Proliant servers.

Rough steps completed so far:

Grabbed the code. Grabbed afl++4.10c, applied the patch, make dist. Stopped short of installing it because I dont think its needed? Followed harnesses/dawn/README to succesfully build tint_afl_all_fuzzer Followed harnesses/dxcompiler/README.md to succesfully create the libdxcompiler.so

$ LD_LIBRARY_PATH=/home/alan/ /home/alan/darthshader/target/release/darthshader --debug-child --input corpus --output output /home/alan/darthshader/harnesses/dawn/dawn/out/build/tint_afl_all_fuzzer Workdir: "/home/alan/darthshader_run" Out dir at "output" already exists. Let's fuzz :) dxcomplib: (nil) Coverage map size: Some(2621440) "corpus/*/" from file "corpus/05a79f06cf3f67f726dae68d18a2290f6c9a50c9.wgsl" Attempting to parse as AST: "corpus/05a79f06cf3f67f726dae68d18a2290f6c9a50c9.wgsl" File instrumentation/afl-compiler-rt.o.c, line 897: Error(wrong forkserver message from AFL++ tool): Success Failed to load initial corpus at "corpus/*/" Unknown("Unable to request new process from fork server (OOM?)", ErrorBacktrace)

I've also tried with a corpus with a single file 1.wgsl with empty content or simply '1' in there. Lastly I tried with a simple file from googles tint_wgsl_fuzzer_seed_corpus.zip. Always getting the same 'error' message. Box is an HP DL360p Gen10 with 256GB ram. No signs of running out of memory.

agoodm88 commented 1 month ago

Above is with deliberately truncated LD path - trying to see if running without libdxcompiler.so would work.

LD_LIBRARY_PATH=/home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/lib /home/alan/darthshader/target/release/darthshader --debug-child --input corpus --output output -t 12000 /home/alan/darthshader/harnesses/dawn/dawn/out/build/tint_afl_all_fuzzer Workdir: "/home/alan/darthshader_run" Out dir at "output" already exists. Let's fuzz :) dxcomplib: 0x51a000000080 dxcCreateInstance: 0x7b370b463700 Coverage map size: Some(2621440) "corpus/*/" from file "corpus/05a79f06cf3f67f726dae68d18a2290f6c9a50c9.wgsl" Attempting to parse as AST: "corpus/05a79f06cf3f67f726dae68d18a2290f6c9a50c9.wgsl" File instrumentation/afl-compiler-rt.o.c, line 897: Error(wrong forkserver message from AFL++ tool): Success Failed to load initial corpus at "corpus/*/" Unknown("Unable to request new process from fork server (OOM?)", ErrorBacktrace)

Is with correct path.

I've also tried to debug this by running the fuzz harness directly:

LD_LIBRARY_PATH=/home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/lib /home/alan/darthshader/harnesses/dawn/dawn/out/build/tint_afl_all_fuzzer [-] FATAL: forkserver is already up, but an instrumented dlopen() library loaded afterwards. You must AFL_PRELOAD such libraries to be able to fuzz them or LD_PRELOAD to run outside of afl-fuzz. To ignore this set AFL_IGNORE_PROBLEMS=1 but this will lead to ambiguous coverage data. In addition, you can set AFL_IGNORE_PROBLEMS_COVERAGE=1 to ignore the additional coverage instead (use with caution!). Aborted

Which doesnt seem to work; however looking in the code it seems you're not meant to run it like this anyway as the rust sets an ENV variable...

$ AFL_IGNORE_PROBLEMS=1 AFL_IGNORE_PROBLEMS_COVERAGE=1 AFL_PRELOAD=/home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/lib /home/alan/darthshader/harnesses/dawn/dawn/out/build/tint_afl_all_fuzzer env LIBAFL_EXIT_ID not set

agoodm88 commented 1 month ago

I'm an idiot. I checked out the wrong afl++ version. Serves me right for trying to get it going at 3am I guess.

wgslfuzz commented 1 month ago

The issue is probably caused by the changes in the forkserver interface of recent versions of AFL++. The libAFL version this fuzzer is built upon doesn't support the new interface. Going forward, it'd be best to upgrade libAFL (which is not a walk in the park because so many things break).

agoodm88 commented 1 month ago

Thanks for coming back to me. Currently trying to get the older version of AFL to build on 24.04.1 but not having a lot of luck.

It would probably be easiest to reinstall the box with whatever version of linux you used?

wgslfuzz commented 1 month ago

Sure thing. I'm using the same Ubuntu version, i.e. 24.04. There is also this archive of Darthshader https://zenodo.org/records/13302737 with a dockerfile for building. It builds plenty of other things (e.g., wgslsmith) which you might not need/want, however it contains the exact build steps I've been taking. If you figure out the issue please let me know so the description can be updated.

agoodm88 commented 1 month ago

Do you have the steps you took to compile afl++ documented? Its not building afl-clang-fast/++ for me which I recall from over a year ago (last time I built this) is a build dependency issue. I've added llvm-dev which gets it a lot further. If it doesnt manage to compile afl-clang-fast it symlinks it to afl-cc which creates obvious errors when trying to build the harnesses.

agoodm88 commented 1 month ago

Looks like you dont actually need the afl-clang-fast binary per say. But you do need it to build with llvm. So basically I followed the instructions in docs/install and just added apt install llvm-dev and did make all instead of make distrib.

agoodm88 commented 1 month ago

Some progress. Its not clear to me if it 'likes' malformed wgsl files, which my huge corpus will be full of. So I am trying with a corpus of a simple wgsl file:

$ cat corpus_empty/1.wgsl @vertex fn main( @builtin(vertex_index) VertexIndex : u32 ) -> @builtin(position) vec4f { var pos = array<vec2f, 3>( vec2(0.0, 0.5), vec2(-0.5, -0.5), vec2(0.5, -0.5) );

return vec4f(pos[VertexIndex], 0.0, 1.0); }

$ LD_LIBRARY_PATH=/home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/lib /home/alan/darthshader/target/release/darthshader --debug-child --input corpus_empty --output output -t 12000 /home/alan/darthshader/harnesses/dawn/dawn/out/build/tint_afl_all_fuzzer Workdir: "/home/alan/darthshader_run" Out dir at "output" already exists. Let's fuzz :) Coverage map size: Some(2621440) "corpus_empty/*/" from file "corpus_empty/1.wgsl" dxcomplib: 0x51a000000080 dxcCreateInstance: 0x700030663760 result 0 [Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000, shared_mem: 67039/2621440 (2%) Client 000: NaN: Scheduler NaN: Manager Feedbacks: NaN: Not Measured

from file "output/queue/1727614531_000000.ron" Offending file: "output/queue/1727614531_000000.ron" thread 'main' panicked at src/layeredinput.rs:301:17: file import error note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

$ cat output/queue/1727614531_000000.ron IR((module:(types:[(name:None,inner:Scalar(kind:Uint,width:4)),(name:None,inner:Vector(size:Quad,kind:Float,width:4)),(name:None,inner:Vector(size:Bi,kind:Float,width:4)),(name:None,inner:Array(base:3,size:Constant(3),stride:8))],special_types:(ray_desc:None,ray_intersection:None,predeclared_types:{}),constants:[],global_variables:[],const_expressions:[Literal(I32(3))],functions:[],entry_points:[(name:"main",stage:Vertex,early_depth_test:None,workgroup_size:(0,0,0),function:(name:Some("main"),arguments:[(name:Some("l"),ty:1,binding:Some(BuiltIn(VertexIndex)))],result:Some((ty:2,binding:Some(BuiltIn(Position(invariant:false))))),local_variables:[(name:Some("l"),ty:4,init:Some(11))],expressions:[FunctionArgument(0),Literal(F32(0.0)),Literal(F32(0.5)),Compose(ty:3,components:[2,3]),Literal(F32(-0.5)),Literal(F32(-0.5)),Compose(ty:3,components:[5,6]),Literal(F32(0.5)),Literal(F32(-0.5)),Compose(ty:3,components:[8,9]),Compose(ty:4,components:[4,7,10]),LocalVariable(1),Access(base:12,index:1),Load(pointer:13),Literal(F32(0.0)),Literal(F32(1.0)),Compose(ty:2,components:[14,15,16])],named_expressions:{1:"VertexIndex"},body:[Emit((start:3,end:4)),Emit((start:6,end:7)),Emit((start:9,end:10)),Emit((start:10,end:11)),Emit((start:12,end:13)),Emit((start:13,end:14)),Emit((start:16,end:17)),Return(value:Some(17))]))])))

stack backtrace:
   0: rust_begin_unwind
             at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/panicking.rs:74:14
   2: <darthshader::layeredinput::LayeredInput as libafl::inputs::Input>::from_file
             at /home/alan/darthshader/src/layeredinput.rs:301:17
   3: <libafl::corpus::inmemory_ondisk::InMemoryOnDiskCorpus<I> as libafl::corpus::Corpus>::load_input_into
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/corpus/inmemory_ondisk.rs:153:25
   4: libafl::corpus::testcase::Testcase<I>::load_len
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/corpus/testcase.rs:346:21
   5: <libafl::schedulers::testcase_score::LenTimeMulTestcaseScore<S> as libafl::schedulers::testcase_score::TestcaseScore<S>>::compute
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/testcase_score.rs:43:15
   6: libafl::schedulers::minimizer::MinimizerScheduler<CS,F,M>::update_score
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/minimizer.rs:268:26
   7: <libafl::schedulers::minimizer::MinimizerScheduler<CS,F,M> as libafl::schedulers::Scheduler>::on_add
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/minimizer.rs:204:9
   8: <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::ExecutionProcessor<OT>>::process_execution
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:390:17
   9: <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::EvaluatorObservers<OT>>::evaluate_input_with_observers
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:471:9
  10: <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::Evaluator<E,EM>>::evaluate_input_events
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:495:9
  11: libafl::fuzzer::Evaluator::evaluate_input
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:124:9
  12: libafl::state::StdState<I,C,R,SC>::continue_loading_initial_inputs_custom
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:665:40
  13: libafl::state::StdState<I,C,R,SC>::load_initial_inputs_custom_by_filenames
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:638:9
  14: libafl::state::StdState<I,C,R,SC>::load_initial_inputs_by_filenames
             at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:703:9
  15: darthshader::fuzz
             at /home/alan/darthshader/src/main.rs:406:14
  16: darthshader::main
             at /home/alan/darthshader/src/main.rs:221:5
  17: core::ops::function::FnOnce::call_once
             at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/ops/function.rs:250:5

note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

agoodm88 commented 1 month ago

Full backtrace:

stack backtrace:
   0:     0x59db002fe309 - std::backtrace_rs::backtrace::libunwind::trace::h9294c60f1c519a6b
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x59db002fe309 - std::backtrace_rs::backtrace::trace_unsynchronized::hb186c7e77bffad8a
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x59db002fe309 - std::sys::backtrace::_print_fmt::hcb03e5d9b54439b1
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:66:9
   3:     0x59db002fe309 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h9f6912c11cb864e2
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:39:26
   4:     0x59db000d4223 - core::fmt::rt::Argument::fmt::h3d03db0338420f78
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/fmt/rt.rs:177:76
   5:     0x59db000d4223 - core::fmt::write::h8debadf270d89c55
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/fmt/mod.rs:1186:21
   6:     0x59db002d10e2 - std::io::Write::write_fmt::h2027e8c4646c73e6
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/io/mod.rs:1833:15
   7:     0x59db00304583 - std::sys::backtrace::BacktraceLock::print::h414cd546f6b36885
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:42:9
   8:     0x59db00304abc - std::panicking::default_hook::{{closure}}::h40ea18cfe0adea53
   9:     0x59db00304abc - std::panicking::default_hook::h8d94df497eda232c
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:295:9
  10:     0x59db00304abc - std::panicking::rust_panic_with_hook::h9a5479236c4bea0b
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:801:13
  11:     0x59db00304625 - std::panicking::begin_panic_handler::{{closure}}::hf9517d9cf0232ee4
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:667:13
  12:     0x59db003045b9 - std::sys::backtrace::__rust_end_short_backtrace::h623c1073442a1748
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:170:18
  13:     0x59db003045ac - rust_begin_unwind
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:665:5
  14:     0x59db000d27cf - core::panicking::panic_fmt::hc1611ab468cf2998
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/panicking.rs:74:14
  15:     0x59dafff9025a - <darthshader::layeredinput::LayeredInput as libafl::inputs::Input>::from_file::h7aac364fe088f659
                               at /home/alan/darthshader/src/layeredinput.rs:301:17
  16:     0x59dafff9025a - <libafl::corpus::inmemory_ondisk::InMemoryOnDiskCorpus<I> as libafl::corpus::Corpus>::load_input_into::haad394b0697bb104
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/corpus/inmemory_ondisk.rs:153:25
  17:     0x59dafffb131a - libafl::corpus::testcase::Testcase<I>::load_len::h08a0f70f7f64a616
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/corpus/testcase.rs:346:21
  18:     0x59dafffb131a - <libafl::schedulers::testcase_score::LenTimeMulTestcaseScore<S> as libafl::schedulers::testcase_score::TestcaseScore<S>>::compute::hc6140e8be637741a
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/testcase_score.rs:43:15
  19:     0x59dafffa523e - libafl::schedulers::minimizer::MinimizerScheduler<CS,F,M>::update_score::h52a54ba0cca9cb8f
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/minimizer.rs:268:26
  20:     0x59dafffa523e - <libafl::schedulers::minimizer::MinimizerScheduler<CS,F,M> as libafl::schedulers::Scheduler>::on_add::hc2bdcbe1d68e303a
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/minimizer.rs:204:9
  21:     0x59daffff775a - <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::ExecutionProcessor<OT>>::process_execution::hac8bda4042be0823
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:390:17
  22:     0x59daffff775a - <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::EvaluatorObservers<OT>>::evaluate_input_with_observers::hd1a65812049d490b
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:471:9
  23:     0x59daffff775a - <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::Evaluator<E,EM>>::evaluate_input_events::h62735126c63aaf8b
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:495:9
  24:     0x59daffff775a - libafl::fuzzer::Evaluator::evaluate_input::h0a42049ea75bdc4e
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:124:9
  25:     0x59db0008036a - libafl::state::StdState<I,C,R,SC>::continue_loading_initial_inputs_custom::h38a08f7cf40ba174
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:665:40
  26:     0x59db0008036a - libafl::state::StdState<I,C,R,SC>::load_initial_inputs_custom_by_filenames::h85910f404f1eb5aa
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:638:9
  27:     0x59db0008036a - libafl::state::StdState<I,C,R,SC>::load_initial_inputs_by_filenames::h14363f970c175769
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:703:9
  28:     0x59db0008036a - darthshader::fuzz::h6777bbb425b9a098
                               at /home/alan/darthshader/src/main.rs:406:14
  29:     0x59db00076246 - darthshader::main::hcb1c4bdbdc4a0397
                               at /home/alan/darthshader/src/main.rs:221:5
  30:     0x59dafffb2716 - core::ops::function::FnOnce::call_once::h7c402fe0424445f2
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/ops/function.rs:250:5
  31:     0x59dafffb2716 - std::sys::backtrace::__rust_begin_short_backtrace::hb45b1ed673021213
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:154:18
  32:     0x59db0008b32d - main
  33:     0x7ca30ce2a1ca - __libc_start_call_main
                               at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  34:     0x7ca30ce2a28b - __libc_start_main_impl
                               at ./csu/../csu/libc-start.c:360:3
  35:     0x59dafff8db95 - _start
  36:                0x0 - <unknown>

Unfortunately day job is calling, so I will have to come back to this later on. Any thoughts would be valued.

wgslfuzz commented 1 month ago

I could probably add some code that simple skips samples that cannot be imported. Would that help?

agoodm88 commented 1 month ago

Yeah; that or I will figure out a way to pre process the files. It seems to just be ones with UTF8 or otherwise unprintable chars in them that are tripping it up so far.

I got it working - I think, but at least so far I can only get the version from your docker file to work. The version from github just refuses with errors above.

Output looks like this: [Testcase #0] run time: 0h-1m-10s, clients: 1, corpus: 218, objectives: 2, executions: 302, exec/sec: 4.273, shared_mem: 56391/2621440 (2%), execs_err: 176, execs_suc: 75 repeating with incrementing numbers. Looks single threaded, so presumably I need to run multiple instances to use my large boxes.

agoodm88 commented 1 month ago

A quick and dirty python script to split files to ascii only and non ascii did help but it still eventually chokes on a file. I think your suggestion about skipping over unimportable files makes the most sense.

These files are whats left after I modifed one of Googles fuzzers in the summer to generate wgsl files by brute force, ignorance and a lot of compute power. This is the minimized corpus that was left after I got bored. I found an almost untriagable amount of crashes with quite a few genuine bugs but sadly nothing that crossed the security line.

The corpus was effectively generated by many hundreds of monkeys (cpu cores) all smashing on a keyboard that could only output fragments of wgsl code. The fuzzer then decided upon what was interesting and obviously mutated over that. Currently I am ramming the entire corpus through the tint filename.wgsl -o filename.hlsl process. For some of the corpus this results in an hlsl file being created. These files appear to have valid fragments of code in them. I am saving the logs from this procedure in the hope that it will allow me to sift good candidate input files out of the rough.

agoodm88 commented 1 month ago

I had a good go at sanitizing my corpus this evening. I think the fuzzer 'likes' input files which 1) dont output errors when fed into some kind of tint operation and 2) dont contain any non ascii characters. Sadly the tint operation(s) the fuzzer wants to pass are evading me and my rust skills are almost non existent so I've not managed to work it out by reading the code yet. (I thought maybe it was looking for no errors in wgsl->ast conversion, however after trying various ways it keeps finding new errors)

wgslfuzz commented 1 month ago

Could you please upload like 1-2 of the inputs that trigger the crash so I can use those to test my patches?

agoodm88 commented 1 month ago

First couple that came out: fn f(){let l=modf(0);}

fn b(){let t=modf(dpdx(5));}

I think the fuzzer loads shortest files first, there are many more complex files than this along with enormous amounts of files probably containing pretty much every kind of syntax error imaginable.

I left the fuzzer running overnight on the box and was greeted this morning with an enormous amount of crashers. It seems that triaging these is not entirely straight forward because the fuzzer saves no relevant output to help me deduce why a file ended up in there. So far I've figured out that the 'lifter' executable turns the ron files into wgsl. It then seems sensible to transform those into hlsl (tint filename.wgsl -o filename.hlsl) and at this point I can feed them into dxc standalone: /home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/bin/dxc-3.7 filename -T something -HV year. Not knowing which variants of -T are interesting I slapped together a triage script that brute force tests every possible variant in parallel. I guess this works but might be wasteful (its been running for hours now) and perhaps there are other CLI based triage processes that might highlight the exciting crashers?

wgslfuzz commented 1 month ago

It seems the documentation lacks a couple of important steps. I'll update that, here is some info already:

Regarding the -T parameter: you'll need to configure the type of shader you want to compile: vertex (vs_), fragment (ps_), or compute (cs_). The type of the shader you need to set depends on the entry point present in the shader. Furthermore, you need to specify the shader version. The version should be between 6_2 and 6_6. So a compute shader in version 6.3 becomes -T cs_6_3. There are more recent shader versions, e.g. 6.7. AFAIK those are not supported by dawn/tint, those reports will hence (likely) excluded from the VRP.

I recommend using the same flags for dxc as specified in the harness, see https://github.com/wgslfuzz/darthshader/blob/6ca6258bbd91fbc3111ce9b4b8a58384afaeee23/harnesses/dawn/patches/dawn_3de0f00.diff#L569-L581

On older versions of dxc (in particular before March 2024) you should find plenty of crashes rather quickly, considering you mentioned having access to a couple of cores. Many issues have been fixed since then, so I'd expect bugs to be rather sparse.

agoodm88 commented 1 month ago

Thanks. This is really helpful.

Based upon this I think the dxc command I should triage with is /home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/bin/dxc-3.7 -E unused_entry_point -Gis -Zpr -enable-16bit-types -HV 2018 -T cs_6_6 file.hlsl; since I dont know for any given test case what kind of shader we are dealing with I figure just try all three relevant ones.

I imported a few seeds from my corpus (basically until it blew up the import process, then restarted it with no input selected) which resulted in thousands of 'crashes' over night. Following the triage process manually I get output as below on final step; which doesnt look like a crash to my eyes leading me to wonder why this was in the crashes directory?

dxc-3.7 -E unused_entry_point -Gis -Zpr -enable-16bit-types -HV 2018 -T cs_6_6 1727744106_000067.wgsl.hlsl 
;
; Input signature:
;
; Name                 Index   Mask Register SysValue  Format   Used
; -------------------- ----- ------ -------- -------- ------- ------
; no parameters
;
; Output signature:
;
; Name                 Index   Mask Register SysValue  Format   Used
; -------------------- ----- ------ -------- -------- ------- ------
; no parameters
; shader hash: 37007ca9dcc6243ff5cf1bc084abd224
;
; Pipeline Runtime Information: 
;
;PSVRuntimeInfo:
; Compute Shader
; NumThreads=(1,1,1)
; MinimumExpectedWaveLaneCount: 0
; MaximumExpectedWaveLaneCount: 4294967295
; UsesViewID: false
; SigInputElements: 0
; SigOutputElements: 0
; SigPatchConstOrPrimElements: 0
; SigInputVectors: 0
; SigOutputVectors[0]: 0
; SigOutputVectors[1]: 0
; SigOutputVectors[2]: 0
; SigOutputVectors[3]: 0
; EntryFunctionName: unused_entry_point
;
;
; Buffer Definitions:
;
;
; Resource Bindings:
;
; Name                                 Type  Format         Dim      ID      HLSL Bind  Count
; ------------------------------ ---------- ------- ----------- ------- -------------- ------
;
target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
target triple = "dxil-ms-dx"

define void @unused_entry_point() {
  ret void
}

!llvm.ident = !{!0}
!dx.version = !{!1}
!dx.valver = !{!2}
!dx.shaderModel = !{!3}
!dx.entryPoints = !{!4}

!0 = !{!"dxc(private) 1.8.0.14746 (main, 75ff50caa)"}
!1 = !{i32 1, i32 6}
!2 = !{i32 1, i32 8}
!3 = !{!"cs", i32 6, i32 6}
!4 = !{void ()* @unused_entry_point, !"unused_entry_point", null, null, !5}
!5 = !{i32 0, i64 8388608, i32 4, !6}
!6 = !{i32 1, i32 1, i32 1}

wgslfuzz commented 1 month ago

The output indicates that for some reason the standalone build of dxc didn't crash. This can have multiple root causes:

you compiled the wrong entry point (ruled out by your attempt to compile each of them)
a mismatch in build flags between the dxc build used for fuzzing vs. the one you used to verify the crash. E.g., the verifier build is without ASAN whereas the fuzzing build is

Assuming there is no mismatch in build flags I recommend the following: build the tint harness + dxc without AFL but just use a regular clang (but do use the same ASAN/UBSAN/... sanitizer flags). This will allow you to pass the wgsl file to dxc exactly as during the fuzzing progress. This should trigger a crash, either in tint or in dxc.

wgslfuzz commented 1 month ago

Another option for not observing the crash in dxc could be your input crashing in tint during fuzzing. Make sure to use the same build flags (modulo instrumentation) for converting from wgsl to hlsl

agoodm88 commented 1 month ago

Thanks, how do you define 'crashed'? I've been watching the triage process and it seems tint outputs some errors and warning which appear to be related to malformed code - eg warning unreachable, one or two other error messages. This might not be unexpected as my input files may not have been valid wgsl in the first place (in fact most likely were not - they came from a lot of random monkeys smashing a wgsl code keyboard). One avenue where I may be missing stuff here is sometimes tint generating invalid code in a conversion is considered a security issue in itself? Maybe thats why you save these?

Beyond that; I tried to edit the rust to skip over malformed files as at the minute its only importing a few hundred files out of my ~100k samples. But unfortunately my rust skills may as well not exist and all I did was make it blow up in literally hundreds of errors. Any pointers here would be very much appreciated; even if just a few cli commands I could use to sanitize the corpus + others I am certain would find the ability to skip over unacceptable inputs useful.

Example tint errors: 1727738444000000.wgsl:105:25 warning: code is unreachable f3673146681((&G1165958494), (&G1165958494)); ^^^^^^^^^^^^

1727738444_000000.wgsl:369:17 error: var with storage address space and readwrite access mode cannot be used by vertex pipeline stage G3425530425 = i32();

wgslfuzz commented 1 month ago

I just pushed a version that should skip all seeds that cannot be imported. Its EOB here but I'll come back regarding the crashes tomorrow.

agoodm88 commented 1 month ago

We have some progress I think as I can now build the version direct from git, whereas previously I had to build the version from the docker image.

This morning before going to my first normal job of the day I tried throwing my complete "monkey" corpus at it. I also tried the complete corpus with only printable files. Sadly its blowing up quite early with attached error log/backtrace. I think its choking on attached file.

badfile.zip

crash.txt

agoodm88 commented 1 month ago

I also tried with an empty corpus and a corpus which includes just one valid file, which is also blowing up. I'm not sure if this issue might be the same one that stopped me running the original copy I grabbed from git? This particular crash went away when I just built the version from your docker file. Attached crash trace. crash.txt

wgslfuzz commented 1 month ago

I believe we're facing 2 different issues: the former one (the one from badfile.zip) contains a shader that triggers a panic in naga. The following happens during import:

the shader is imported by the fuzzer
the shader is passed to naga in order to lift it to IR
naga crashes/panics during parsing, which in turn terminates the entire fuzzer

This is a bit of an annoying situation; I'll tomorrow add some code to catch these errors.

Regarding the latter crash: I can't quite make sense of this one, the file was successfully lifted (to 1727849478_000000.ron), but when attempting to load the .ron file the fuzzer crashes.

agoodm88 commented 1 month ago

Thanks for your efforts so far :-)

What I dont understand why does the copy I extracted from your docker file build and run with an empty corpus, when the version from git never has? Something feels off there...

Maybe I do something wrong in building the git version. Here is my complete process; excluding installing build prereqs etc:

git clone https://github.com/wgslfuzz/darthshader.git cd darthshader cargo build wget https://github.com/AFLplusplus/AFLplusplus/archive/refs/tags/v4.10c.zip cd AFLplusplus-4.10c make -j112 source-only cd dawn git clone https://dawn.googlesource.com/dawn cd dawn git checkout 3de0f00 cp scripts/standalone.gclient .gclient gclient sync # this might trigger an error but should be fine anyways cp ../patches/dawn_3de0f00.diff . git apply dawn_3de0f00.diff

mkdir -p out/build cd out/build CC=/home/alan/darthshader/AFLplusplus-4.10c/afl-clang-fast CXX=/home/alan/darthshader/AFLplusplus-4.10c/afl-clang-fast++ cmake -GNinja -DTINT_BUILD_AFL_FUZZER=ON -DDAWN_ENABLE_ASAN=ON -DTINT_BUILD_MSL_WRITER=ON -DTINT_BUILD_SPV_WRITER=ON -DTINT_BUILD_HLSL_WRITER=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_AR=/usr/bin/llvm-ar-18 -DCMAKE_CXX_COMPILER_RANLIB=/usr/bin/llvm-ranlib-18 -DCMAKE_C_COMPILER_AR=/usr/bin/llvm-ar-18 -DCMAKE_C_COMPILER_RANLIB=/usr/bin/llvm-ranlib-18 -DCMAKE_CXX_FLAGS="-fuse-ld=lld" -DCMAKE_C_FLAGS="-fuse-ld=lld" ../.. ninja tint_afl_all_fuzzer

cd ~/darthshader/harnesses/dxcompiler/ git clone https://github.com/microsoft/DirectXShaderCompiler.git cd DirectXShaderCompiler git submodule init git submodule update

cd DirectXShaderCompiler mkdir -p out/build cd out/build CC=/home/alan/darthshader/AFLplusplus-4.10c/afl-clang-fast CXX=/home/alan/darthshader/AFLplusplus-4.10c/afl-clang-fast++ cmake ../../ -C ../../cmake/caches/PredefinedParams.cmake -DCMAKE_BUILD_TYPE=Release -DDXC_DISABLE_ALLOCATOR_OVERRIDES=ON -DENABLE_SPIRV_CODEGEN=OFF -DSPIRV_BUILD_TESTS=OFF -DLLVM_USE_SANITIZER=Address -DLLVM_ENABLE_LTO=Off -G Ninja ninja cd mkdir darthshader_run2 RUST_BACKTRACE=full LD_LIBRARY_PATH=/home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build /home/alan/darthshader/target/debug/darthshader --output output -t 12000 /home/alan/darthshader/harnesses/dawn/dawn/out/build/tint_afl_all_fuzzer 2>&1 | tee crash cat crash

$ cat crash Workdir: "/home/alan/darthshader_run2" Let's fuzz :) Coverage map size: Some(212800) [Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000, shared_mem: 5227/212800 (2%) Client 000: NaN: Scheduler NaN: Manager Feedbacks: NaN: Not Measured from file "output/queue/1727906272_000000.ron" Offending file: "output/queue/1727906272_000000.ron" thread 'main' panicked at src/layeredinput.rs:301:17: Yet in the same folder the version I untarred from your large docker tarball just runs and all I did was untar it and run cargo build in the relevant folder: LD_LIBRARY_PATH=/home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build /home/alan/scratch/aedarthshader/buildfiles/darthshader/target/debug/darthshader --output output -t 12000 /home/alan/darthshader/harnesses/dawn/dawn/out/build/tint_afl_all_fuzzer

wgslfuzz commented 1 month ago

I added a panic handler that should allow you to better deal with files posing issues during import.

agoodm88 commented 1 month ago

Sadly its now blowing up in main. Here is a sample of files from my corpus that blow it up and a crash trace.

Client 000:
     NaN: Scheduler
     NaN: Manager
  Feedbacks:
     NaN: Not Measured

from file "output/queue/1728167311_000000.ron"
Offending file: "output/queue/1728167311_000000.ron"
thread 'main' panicked at src/layeredinput.rs:304:17:
file import error
stack backtrace:
   0:     0x5de96f395e9a - std::backtrace_rs::backtrace::libunwind::trace::h9294c60f1c519a6b
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x5de96f395e9a - std::backtrace_rs::backtrace::trace_unsynchronized::hb186c7e77bffad8a
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x5de96f395e9a - std::sys::backtrace::_print_fmt::hcb03e5d9b54439b1
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:66:9
   3:     0x5de96f395e9a - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h9f6912c11cb864e2
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:39:26
   4:     0x5de96f3bf063 - core::fmt::rt::Argument::fmt::h3d03db0338420f78
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/fmt/rt.rs:177:76
   5:     0x5de96f3bf063 - core::fmt::write::h8debadf270d89c55
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/fmt/mod.rs:1186:21
   6:     0x5de96f392583 - std::io::Write::write_fmt::h2027e8c4646c73e6
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/io/mod.rs:1833:15
   7:     0x5de96f395ce2 - std::sys::backtrace::BacktraceLock::print::h414cd546f6b36885
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:42:9
   8:     0x5de96f396cc7 - std::panicking::default_hook::{{closure}}::h40ea18cfe0adea53
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:268:22
   9:     0x5de96f396af6 - std::panicking::default_hook::h8d94df497eda232c
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:295:9
  10:     0x5de96f3972f7 - std::panicking::rust_panic_with_hook::h9a5479236c4bea0b
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:801:13
  11:     0x5de96f397156 - std::panicking::begin_panic_handler::{{closure}}::hf9517d9cf0232ee4
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:667:13
  12:     0x5de96f396379 - std::sys::backtrace::__rust_end_short_backtrace::h623c1073442a1748
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:170:18
  13:     0x5de96f396e1c - rust_begin_unwind
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:665:5
  14:     0x5de96f3bc470 - core::panicking::panic_fmt::hc1611ab468cf2998
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/panicking.rs:74:14
  15:     0x5de96e9a26a4 - <darthshader::layeredinput::LayeredInput as libafl::inputs::Input>::from_file::hbc4f999f44a994b2
                               at /home/alan/darthshader/src/layeredinput.rs:304:17
  16:     0x5de96ea6a134 - <libafl::corpus::inmemory_ondisk::InMemoryOnDiskCorpus<I> as libafl::corpus::Corpus>::load_input_into::h20cf7da0932ee256
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/corpus/inmemory_ondisk.rs:153:25
  17:     0x5de96ea05e7b - libafl::corpus::testcase::Testcase<I>::load_len::h87a22b2c4ce223d7
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/corpus/testcase.rs:346:21
  18:     0x5de96e87568d - <libafl::schedulers::testcase_score::LenTimeMulTestcaseScore<S> as libafl::schedulers::testcase_score::TestcaseScore<S>>::compute::hb88046b5ceb81a8f
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/testcase_score.rs:43:15
  19:     0x5de96e9baab5 - libafl::schedulers::minimizer::MinimizerScheduler<CS,F,M>::update_score::hd3c164667d9429a4
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/minimizer.rs:268:26
  20:     0x5de96e9ba820 - <libafl::schedulers::minimizer::MinimizerScheduler<CS,F,M> as libafl::schedulers::Scheduler>::on_add::h2ff1bedc6a51127c
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/schedulers/minimizer.rs:204:9
  21:     0x5de96ea1b1c2 - <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::ExecutionProcessor<OT>>::process_execution::h691113821321cad0
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:390:17
  22:     0x5de96ea1a89e - <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::EvaluatorObservers<OT>>::evaluate_input_with_observers::hc20781ca70eded51
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:471:9
  23:     0x5de96ea19333 - <libafl::fuzzer::StdFuzzer<CS,F,OF,OT> as libafl::fuzzer::Evaluator<E,EM>>::evaluate_input_events::h980ab0fc0b27bab8
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:495:9
  24:     0x5de96ea1c70c - libafl::fuzzer::Evaluator::evaluate_input::h4503e43f443b1cb1
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/fuzzer/mod.rs:124:9
  25:     0x5de96e930ad1 - libafl::state::StdState<I,C,R,SC>::continue_loading_initial_inputs_custom::hcd27d4cfa4d95ce4
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:665:40
  26:     0x5de96e931343 - libafl::state::StdState<I,C,R,SC>::load_initial_inputs_custom_by_filenames::h07cee6fed918b687
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:638:9
  27:     0x5de96e9305b8 - libafl::state::StdState<I,C,R,SC>::load_initial_inputs_by_filenames::h03e6ff95b8b8ad6e
                               at /home/alan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libafl-0.11.2/src/state/mod.rs:703:9
  28:     0x5de96e9cf7a8 - darthshader::fuzz::h196a74a56f319e51
                               at /home/alan/darthshader/src/main.rs:410:9
  29:     0x5de96e9cca85 - darthshader::main::h7233ff72aaa519ea
                               at /home/alan/darthshader/src/main.rs:221:5
  30:     0x5de96eab641b - core::ops::function::FnOnce::call_once::h0f4f2b07cc69ec71
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/ops/function.rs:250:5
  31:     0x5de96e883e1e - std::sys::backtrace::__rust_begin_short_backtrace::hf4e7800c3143f518
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/sys/backtrace.rs:154:18
  32:     0x5de96e9a5fe1 - std::rt::lang_start::{{closure}}::h6bf73343aceb52ee
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/rt.rs:164:18
  33:     0x5de96f38d320 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h05c6bccc07d3f6f8
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/core/src/ops/function.rs:284:13
  34:     0x5de96f38d320 - std::panicking::try::do_call::ha7775c5d151b7535
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:557:40
  35:     0x5de96f38d320 - std::panicking::try::hc3e9847a907a671c
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:520:19
  36:     0x5de96f38d320 - std::panic::catch_unwind::hfd59bc98b3ebc4b9
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panic.rs:348:14
  37:     0x5de96f38d320 - std::rt::lang_start_internal::{{closure}}::hc28d404ae901ba2f
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/rt.rs:143:48
  38:     0x5de96f38d320 - std::panicking::try::do_call::h3bf19b3691b9641b
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:557:40
  39:     0x5de96f38d320 - std::panicking::try::h89d7b1c2144c0423
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panicking.rs:520:19
  40:     0x5de96f38d320 - std::panic::catch_unwind::hf2b61873f94e33f2
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/panic.rs:348:14
  41:     0x5de96f38d320 - std::rt::lang_start_internal::hc6683a15201d56bc
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/rt.rs:143:20
  42:     0x5de96e9a5fba - std::rt::lang_start::hff7787f09a5cbb23
                               at /rustc/fa724e5d8cbbdfbd1e53c4c656121af01b694406/library/std/src/rt.rs:163:17
  43:     0x5de96e9d078e - main
  44:     0x7ab97ea2a1ca - __libc_start_call_main
                               at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  45:     0x7ab97ea2a28b - __libc_start_main_impl
                               at ./csu/../csu/libc-start.c:360:3
  46:     0x5de96e86c645 - _start
  47:                0x0 - <unknown>

I am still very suspicious that the version I downloaded from here and issued 'cargo build' to is blowing up with an empty corpus, whereas the version from your docker file, also 'cargo build' and runs for days? Both running with the same dxc/tint built per the instructions. This feels 'off' to me?

sample.tar.gz

I note that the sample (smallest 1000 files in corpus) only really contains fragments of single shader related keywords, nothing remotely valid it seems. Most of the corpus are much longer files which do include more content.

wgslfuzz commented 1 month ago

The version with the docker file and the initial git commit should be basically identical, except for some parts in the README and the license. I'm PTO this week and don't have access to my machine but I will investigate next week. Could you also provide the 1728167311_000000 file? Sorry for all the trouble and thanks for providing feedback. The fuzzer is supposed to be easy to setup and run, but alas, it isn't (yet).

agoodm88 commented 1 month ago

RON file attached.

I am on holiday at the minute as well, though when you're self employed you never truly leave work behind so I am sitting on the beach at the minute writing this. I feel bad and sorry for you because an idiot (me) has came along with their corpus probably containing every conceivable variation of invalid WGSL code possible and is acting all surprised when someone elses fuzzer is puking on the files! But ultimately I think there is mileage in your fuzzer mutating these files (some of which are actual WGSL code) to try and find interesting and new crashes. I've alreay broken DXC a few times as it is.

1728167311_000000.zip

wgslfuzz commented 1 month ago

Well, it's a fuzzers job to deal with malformed inputs so expecting it to handle malformed files is a reasonable assumption. The fuzzer attempts to generate wgsl files from scratch (until the corpus fails to grow for 100 generated samples). Hence an empty seed corpus should not be a problem unless there is an issue with the instrumentation).

loveraven42 commented 1 month ago

hi, I found the code

"wgsl" => {
                let res = std::panic::catch_unwind(|| {
                    naga::front::wgsl::parse_str(&input)
                        .map_err(|e| Error::illegal_argument(e.to_string()))
                });

                let Ok(Ok(module)) = res else {
                    println!("Attempting to parse as AST: {:?}", path.as_ref());
                    let ast = match Ast::try_from_wgsl(input.as_bytes()) {
                        Ok(ast) => ast,
                        Err(_) => Ast::try_from_wgsl("".as_bytes()).unwrap(),
                    };
                    return Ok(LayeredInput::Ast(ast));
                };
                module
            }
            _ => {
                println!("Offending file: {:?}", path.as_ref());
                panic!("file import error");
            }

the fucntion Ast::try_from_wgsl will return error, so it could cause above panic. I will continue to research

wgslfuzz commented 1 month ago

So there was a small discrepancy between the github version and the artifact eval version after all. How the regression got introduced is beyond me; it should be fixed with commit cdc28b5. The fuzzer now imports the files from sample.tar.gz without triggering a panic.

agoodm88 commented 1 month ago

Thanks for persevering with this! I combined the original monkey corpus with the results of leaving the fuzzers hammering away for another week and fed that to your fuzzer. Looks like its going to work now.

Last few questions I think... I've got a large amount of compute power to throw at this; several hundred cores. I presume that scaling this out is as simple as starting multiple instances of the fuzzer on each box, all pointed at the same output directory? Presumably there is none of the AFL master/slave style requirement? I assume that there is some random element to the input selection, such that they hammer out different results? It doesnt look like there is any way to 'resume' a fuzzing session? So I should start with an empty output directory for each 'run'? Usually I would periodically minimize/merge my corpus across the nodes to create a new 'starting point' corpus. Is this a valuable strategy with your fuzzer? If so I didnt spot any method of minimizing/merging the corpus?

agoodm88 commented 1 month ago

I've been running the fuzzer with the monkey corpus today and it seems to be fuzzing well now. I'm running $numberofcores copies of the fuzzer all with the same input and output directory set which appears to work. I note that the fuzzer outputs huge number of duplicated crash files. Hard to know if this is normal for any fuzzer as most name the crashers as the hash of the contents, naturally deduplicating them.

I have lots of output like this from the running fuzzer which seems good?: [Testcase #0] run time: 6h-53m-21s, clients: 1, corpus: 14003, objectives: 30, executions: 903251, exec/sec: 36.47, execs_suc: 217448, shared_mem: 49329/212800 (23%), stability: 211766/212800 (99%), execs_err: 661580 Client 000: 0.0445: Scheduler 0.0000: Manager Stage 0: 0.0000: GetInputFromCorpus 0.3555: Mutate 0.0001: MutatePostExec 0.1046: TargetExecution 0.0758: PreExecObservers 0.1788: PostExecObservers Feedbacks: 0.0000: ConstFeedback 0.0000: time 0.0000: CrashFeedback 0.0000: exit 0.1147: mapfeedback_metadata_shared_mem 0.1260: Not Measured

[Objective #0] run time: 6h-56m-48s, clients: 1, corpus: 13978, objectives: 65, executions: 731899, exec/sec: 29.30, shared_mem: 49435/212800 (23%), execs_suc: 155620, stability: 211851/212800 (99%), execs_err: 555251 Client 000: 0.0298: Scheduler 0.0000: Manager Stage 0: 0.0000: GetInputFromCorpus 0.4856: Mutate 0.0001: MutatePostExec 0.0819: TargetExecution 0.0606: PreExecObservers 0.1423: PostExecObservers Feedbacks: 0.0000: exit 0.0921: mapfeedback_metadata_shared_mem 0.0000: CrashFeedback 0.0000: time 0.0000: ConstFeedback 0.1076: Not Measured

I hacked together a script to process the crashes into a digestible bunch of log files. I will share it here because maybe someone else will find it useful. I am not a developer, so coding crimes have probably comitted! Any thoughts re the process I am following to triage this would be useful. I do have some sanitizer hits when I process the files using the script but nothing exciting so far.

!/bin/bash

fdupes -N -d . #the fuzzer outputs huge amounts of duplicated crashes so we dedupe here ls -v | cat -n | while read n f; do mv -n "$f" "$n.ron"; done #rename the files to number.ron because lifter doesnt like the file names the fuzzer outputs for i in ls | grep ron; do /home/alan/scratch/aedarthshader/buildfiles/darthshader/target/debug/lifter --file $i 2>&1 | tee lift$i.log; done #lift the ron file to wgsl for i in ls | grep wgsl; do /home/alan/dawn/out/libfuzzerasan/tint $i -o $i.hlsl 2>&1 | tee tint$i.log; done #convert the wgsl file to hlsl

export DXC_PATH="/home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/bin/dxc-3.7"

find *hlsl -type f | while read -r f; do cat ../types | while read -r t; do #target shader types cat ../years | while read -r h; do #target hardware years grep () $f | grep void | cut -d " " -f 2 | cut -d ( -f 1 | sort -u | while read -r e; do #entry point seems to be defined as void something() this programmatically finds everything that looks like an entry point echo "$DXC_PATH -E $e -Gis -Zpr -enable-16bit-types -HV $h -T $t $f 2>&1 | tee $e$f$t$h.log" #run dxc with the various options im interested in done done done done | xargs -P 56 -I {} bash -c "{}" #hacky way to parallelise the dxc runs fdupes -N -d . #lots of duplicated log files so I dedupe here

To run a triage I run the following: mkdir expo cp output/crashes/* expo cd expo cp ../process_results.sh . ./process_results.sh

I'm targeting types: vs_6_6 ps_6_6 cs_6_6 and years 2018 / 2021. Does this seem sensible? If other combos are interesting I guess I need to run without 16bit types?

Once completed I run cat log | sort -u and manually look at the output. Often grep SUMM log is probably all we're interested in. Most of what I see are 'caught errors' and warnings which I dont think are interesting from a security standpoint?

P.S. I've attempted to get the other nodes installed today. Looks like other changes made in other commits may have broken the build process for the fuzzer binary. I've ran out of time to debug this tonight, a fresh head and caffeine might help me see what is wrong. Its been a long day so I might be dping something stupid.

$ ninja tint_afl_all_fuzzer
[1/3] Building CXX object src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc.o
FAILED: src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc.o 
/home/alan/darthshader/AFLplusplus-4.10c/afl-clang-fast++ -DTINT_BUILD_GLSL_VALIDATOR=1 -DTINT_BUILD_GLSL_WRITER=1 -DTINT_BUILD_HLSL_WRITER=1 -DTINT_BUILD_IR_BINARY=1 -DTINT_BUILD_IS_LINUX=1 -DTINT_BUILD_IS_MAC=0 -DTINT_BUILD_IS_WIN=0 -DTINT_BUILD_MSL_WRITER=1 -DTINT_BUILD_SPV_READER=1 -DTINT_BUILD_SPV_WRITER=1 -DTINT_BUILD_SYNTAX_TREE_WRITER=0 -DTINT_BUILD_TINTD=0 -DTINT_BUILD_WGSL_READER=1 -DTINT_BUILD_WGSL_WRITER=1 -I/home/alan/darthshader/harnesses/dawn/dawn -I/home/alan/darthshader/harnesses/dawn/dawn/include -I/home/alan/darthshader/harnesses/dawn/dawn/third_party/vulkan-deps/spirv-headers/src/include -I/home/alan/darthshader/harnesses/dawn/dawn/third_party/vulkan-deps/spirv-tools/src/include -fuse-ld=lld -O3 -DNDEBUG -std=gnu++17 -fPIE -fno-exceptions -fno-rtti -Wno-deprecated-builtins -Wno-unknown-warning-option -Wno-switch-default -fsanitize=address -MD -MT src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc.o -MF src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc.o.d -o src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc.o -c /home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc
/home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc:60:1: error: a type specifier is required for all declarations
__AFL_FUZZ_INIT();
^
/home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc:105:26: error: use of undeclared identifier '__AFL_FUZZ_TESTCASE_BUF'
    unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
                         ^
/home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc:106:12: error: use of undeclared identifier '__AFL_LOOP'
    while (__AFL_LOOP(10000)) {
           ^
/home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer_main.cc:107:30: error: use of undeclared identifier '__AFL_FUZZ_TESTCASE_LEN'
        size_t len = (size_t)__AFL_FUZZ_TESTCASE_LEN;
                             ^
4 errors generated.
[2/3] Building CXX object src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc.o
FAILED: src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc.o 
/home/alan/darthshader/AFLplusplus-4.10c/afl-clang-fast++ -DTINT_BUILD_GLSL_VALIDATOR=1 -DTINT_BUILD_GLSL_WRITER=1 -DTINT_BUILD_HLSL_WRITER=1 -DTINT_BUILD_IR_BINARY=1 -DTINT_BUILD_IS_LINUX=1 -DTINT_BUILD_IS_MAC=0 -DTINT_BUILD_IS_WIN=0 -DTINT_BUILD_MSL_WRITER=1 -DTINT_BUILD_SPV_READER=1 -DTINT_BUILD_SPV_WRITER=1 -DTINT_BUILD_SYNTAX_TREE_WRITER=0 -DTINT_BUILD_TINTD=0 -DTINT_BUILD_WGSL_READER=1 -DTINT_BUILD_WGSL_WRITER=1 -I/home/alan/darthshader/harnesses/dawn/dawn -I/home/alan/darthshader/harnesses/dawn/dawn/include -I/home/alan/darthshader/harnesses/dawn/dawn/third_party/vulkan-deps/spirv-headers/src/include -I/home/alan/darthshader/harnesses/dawn/dawn/third_party/vulkan-deps/spirv-tools/src/include -fuse-ld=lld -O3 -DNDEBUG -std=gnu++17 -fPIE -fno-exceptions -fno-rtti -Wno-deprecated-builtins -Wno-unknown-warning-option -Wno-switch-default -fsanitize=address -MD -MT src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc.o -MF src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc.o.d -o src/tint/CMakeFiles/tint_afl_all_fuzzer.dir/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc.o -c /home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc
/home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc:278:54: error: expected '}'
    exit_code[0] = run(data, size, nullptr, nullptr);
                                                     ^
/home/alan/darthshader/harnesses/dawn/dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc:277:73: note: to match this '{'
void run_set_exit(const uint8_t* data, size_t size, uint8_t* exit_code) {
                                                                        ^
1 error generated.
ninja: build stopped: subcommand failed.

agoodm88 commented 1 month ago

The above errors are due to building afl without llvm-dev libclang-dev clang installed. After rebuilding with those dependencies I was left with the missing } in dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc. I dont get how this is missing as the } is in the patch. Adding it manually allowed the harness to build.

P.S. I think using the same output folder for all instances is actually huring performance since files in the queue are duplicated. Usually you dont point more than one afl instance per output folder so I've changed to this approach. Too early to tell if its 'better' or not.

wgslfuzz commented 3 weeks ago

I presume that scaling this out is as simple as starting multiple instances of the fuzzer on each box, all pointed at the same output directory?

The same output directory might work, I generally have a separate output directory per fuzzer instance

Presumably there is none of the AFL master/slave style requirement?

Correct. Note that different fuzzer instance won't import the respective findings of other instances.

I assume that there is some random element to the input selection, such that they hammer out different results?

The is no deterministic mode or anything like that, so different instances of the fuzzer indeed generate different inputs

It doesnt look like there is any way to 'resume' a fuzzing session? So I should start with an empty output directory for each 'run'?

That is correct, resuming a campaign is not supported so far.

Usually I would periodically minimize/merge my corpus across the nodes to create a new 'starting point' corpus. Is this a valuable strategy with your fuzzer? If so I didnt spot any method of minimizing/merging the corpus?

The fuzzer can import .ron files. If you merge the .ron files from multiple campaigns into a single directory you could start a new campaign with this directory as seed. The initial corpus import should keep only those files that increase coverage.

wgslfuzz commented 3 weeks ago

I have lots of output like this from the running fuzzer which seems good?:

The fuzzer spends quite some time with mutations and in PostExecObservers. Is this a release build of the fuzzer? Other than that, the output looks okay.

The fuzzer indeed finds plenty of crashes that turn out to be duplicates. I generally ran all findings through a separate build of dxc and grouped by PC of the detected ASAN violation. The "script/coding" looks though. What I can recommend as well is compiling dxc with asserts and grouping crashes by the respective assertion violation. Both attempts are just a rough approximation though; neither approach will correctly group by root cause. So what I did in the past was: report a bug, wait for it to be fixed, replay all previous findings and see which inputs continue to reproduce. Repeat until no more crashes remaining. One word of caution though: chrome started to ship a custom version of dxc with asserts enabled, see PR #4 . This implies that the Chrome VRP will no longer consider memory safety issues in dxc unless they also reproduce in a dxc build with asserts enabled.

I'm targeting types: vs_6_6 ps_6_6 cs_6_6 and years 2018 / 2021. Does this seem sensible? If other combos are interesting I guess I need to run without 16bit types?

You can also try other versions of the shader version, anything between 6.2 and 6.6 is a valid target. When in comes to years: at least for chromium/edge only 2018 seems relevant ATM. If you're also interested in bugs that are unrelated to chromium/webgpu other combinations are equally valid.

Most of what I see are 'caught errors' and warnings which I dont think are interesting from a security standpoint?

That is correct, when reprocessing the findings I just look for ASAN crashes and discard the rest.

After rebuilding with those dependencies I was left with the missing } in dawn/src/tint/fuzzers/tint_wgsl_reader_all_writer_fuzzer.cc. I dont get how this is missing as the } is in the patch.

You're right, there seems to be an issue with the patch file. I'll investigate.

wgslfuzz commented 3 weeks ago

BTW if I overlooked any other question in this ticket let me know

agoodm88 commented 3 weeks ago

You've gone above and beyond my expectations. Fantastic really :-) Are you able to share any suggestions for minimizing crashers? I dont think anybody will entertain my crashes so far as itll be like why yes here is a stack trace for an invalid write/buffer overrun and a 2GB test case. Even I am lost as to what to do with those crazy things!

wgslfuzz commented 3 weeks ago

Well, I was going to write about testcase minimization in issue #6 but here we go: I can recommend the creduce tool. It has a --not-c mode that works quite well for wgsl files. I found its interface to be a bit strange, but once you get it to work its awesome.

wgslfuzz commented 3 weeks ago

The harness patch file should work again, thanks for pointing this one out.

wgslfuzz commented 3 weeks ago

Just updated the README to contain some more info on crash reduction

agoodm88 commented 3 weeks ago

I'm running out of shared memory limit on my run. It doesnt seem to be obviously exposed to the user to increase this? I'm running on boxes with huge amounts of ram so 2GB/core doesnt really scratch the surface. As a workaround I stopped my run and am trying to concatenate and dedupe the entire corpus into one folder which I am then hoping to restart the run with. But it looks like there will be an inconveniently huge amount of files after completion.

wgslfuzz commented 3 weeks ago

How many files to you have in the queue? Furthermore, could you please provide some stats on the on-disk size of the .ron files + the on-disk size of the .metadata files?

agoodm88 commented 3 weeks ago

I concatenated all of the queue directories into one folder (about 2.2 million files), deduplicated the files (leaving about 250k files), then took the remaining files and started a single thread while watching carefully. Once it said "imported x files from disk" I stopped it and moved the queue directory to be my input and started the process again. The entire procedure took about 10 hours to complete! Therefore I don't have any useful stats at the minute, but I will take a look tomorrow morning.

wgslfuzz commented 2 weeks ago

The number of files is a bit surprising. Did you compile with vanilla AFL++ 4.10c or did you also apply the patch file (for AFL++)?

agoodm88 commented 2 weeks ago

Good spot! I actually forgot to apply the patch in my 'setup notes' above as well. Patch now applied, harnesses rebuilt and run restarted (from the original monkey corpus, filtered for ascii only). Crossing fingers :-)

wgslfuzz commented 2 weeks ago

Without the patch it might be rather difficult for the fuzzer to minimize the samples (hence giant testcases) + you'll end up with way to many samples in the queue

agoodm88 commented 2 weeks ago

Thanks. Stupid question time, but I dont see it mentioned anywhere... On the first node I installed my afl build for lto mode is failing because I have a conflicting dependency installed. I understand this well and decided to make sure I avoided the issue on the other nodes. When running the other nodes, they produce significantly less crash files than the first node. Looking at the output; the first node; built without lto mode in afl has execs_suc: 1788976 / execs_err: 14858016 in its output. Whereas the new nodes always report 0 for this metric. Additionally the first node has many more 'objectives' (whatever that is?) than the newer nodes.

I cant decide whether the enormous amount of crashes along with execs_succ/err output is 'correct' or if the quieter output of the newer nodes with execs_succ/err = 0 is correct?

Ninja edit, after some hours the 'LTO' build runs do appear to run normally however they never report any succesful or error executions?

Ninja edit 2, after about 16 hours, presumably with a correctly compiled set of harnesses and AFL on a couple hundred cores: 217676 crashers. Of which 108840 were unique. A large amount of ASAN hits, mostly leaks and the ilist null pointer dereference. Theres a few other hits that make less sense and dont replicate easily like this one which makes no sense at all:

alan@dl360p10fuzz:~/darthshader_run/interesting$ grep == -A1 F31024.wgsl.hlslvs_6_62018.log

==3104308==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7ffff1bb9430 bp 0x7fffffffd550 sp 0x7fffffffd4d8 T0) ==3104308==The signal is caused by a READ memory access. ==3104308==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used.

0 0x7ffff1bb9430 in llvm::opt::Option::Option(llvm::opt::OptTable::Info const, llvm::opt::OptTable const) /home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/lib/Option/Option.cpp:23

-- ==3104308==ABORTING $ /home/alan/darthshader/harnesses/dxcompiler/DirectXShaderCompiler/out/build/bin/dxc-3.7 -E F -T vs_6_6 -HV 2018 31024.wgsl.hlsl 31024.wgsl.hlsl:1:2: warning: Group size of 2331 (2331 1 1) is outside of valid range [1..1024] - attribute will be ignored [-Wignored-attributes] [numthreads(2331, 1, 1)] ^~~~~~ error: validation errors 31024.wgsl.hlsl:2: error: Loop must have break. 31024.wgsl.hlsl:5:5: error: sync in a non-Compute/Amplification/Mesh/Node Shader must only sync UAV (sync_uglobal). note: at 'call void @dx.op.barrier(i32 80, i32 9)' in block '#0' of function 'F'. 31024.wgsl.hlsl:2: error: Entry function performs some operation that is incompatible with the shader stage or other entry properties. See other errors for details. 31024.wgsl.hlsl:2: error: Function uses features incompatible with the shader stage (vs) of the entry function. 31024.wgsl.hlsl:2: error: Function requires a visible group, but is called from a shader without one. Validation failed.

Note: its a crash from a dxc run with vs_6_6 type; however the first line of the test case should make the test case invalid for type vs? I've not managed to replicate the crash after the triage. Maybe I get weird results in my triage because I run the triage in parallel and some test cases really want to crash my system by using effectively infinite memory?

wgslfuzz commented 2 weeks ago

libAFL implements bug oracles other than crashes (such as detecting differential executions). Hence it calls fuzzer findings "objectives". In the case of this fuzzers, objective and crash can be used interchangeably. The execs_suc: 1788976 / execs_err: 14858016 looks much better. What we're measuring here is the number of shaders that could be compiled successfully (1788976) and the number of shaders which failed to compile (14858016). This metric is not supposed to be 0; I'm not sure how this is happening.

When re-processing crashes I tend to set ASAN flags such that leaks are ignored.

however the first line of the test case should make the test case invalid for type vs

Do you mean that the validation errors should prevent the crash?