google / silifuzz

Apache License 2.0
391 stars 25 forks source link

Seeking guidance on Corpus Generation and Tool Output #10

Closed neelkrish closed 10 months ago

neelkrish commented 10 months ago

I setup silifuzz successfully and was able to run the fuzzer with your custom seed and then scan the CPU. To explore further, I have been tinkering with creating my own input seeds and generating runnable corpus.

Q1: Input Seed and Corpus Creation I have generated my custom input seed and subsequently created an input corpus. Upon executing the command:

./tools/snap_tool --raw print /tmp/seed/<input_instruction_file>

The output indicates modifications in the gregs section without displaying register values, unlike the original seed where I observed values such as rax = 0x1 during an inc eax operation. Does this absence of register values suggest an issue with the validity of the seed?

Q2: Snapshot Count in Corpus Generation

On my system, I have created a corpus using the following procedure:

echo -en '\xFF\xC0' > /tmp/inc_eax
./tools/snap_tool generate_corpus /tmp/inc_eax.pb --target_platform="${PLATFORM_ID}" > /tmp/inc_eax.corpus
./tools/snap_corpus_tool list_snaps /tmp/inc_eax.corpus

The output shows a single snapshot (04092d36e2ce88b4563db8b37d3ee5a498d5bcba). Is it expected to have only one snapshot per corpus, or should there be multiple snapshots? The output is not the same as what you have shown as an example.

Q3: Post-processing Guidance

I couldn't find documentation explaining how to interpret the tool output and perform post-processing on the results. Specifically, I am uncertain about the meaning of a "violation" in this context. Could you provide guidance on how to interpret the output and what actions (reproduction & root cause analysis) should be taken in the case of a violation?

ksteuck commented 10 months ago

Thanks for trying out silifuzz. Just to set the expectations, silifuzz tools and documentation provided here are aimed at researchers and engineers working in the SDC area. These tools are not exactly plug-n-play. Any real world deployment will require effort to scale using your specific infrastructure.

I have been tinkering with creating my own input seeds and generating runnable corpus.

You want to check out the fuzzing section of the README. This is the way to automatically generate a large number of test inputs.

Q1

The 0x0 values are usually skipped by the various tools. Please post the exact output if you need help debugging this.

Q2

That series of commands naturally produces a corpus with a single input. In the real world scenario you want to package as many inputs as possible in a single corpus shard. Our prod workloads have ~50k per shard. The exact output of the tool may vary, the documentation may not be up-to-date.

Q3

Not sure what specific tool output you're referring to or the context for "violation". Can you clarify?