Suggestion for clarification re: how to compare your fuzzer against RESTler

microsoft / restler-fuzzer

RESTler is the first stateful REST API fuzzing tool for automatically testing cloud services through their REST APIs and finding security and reliability bugs in these services.

MIT License

2.52k stars 283 forks source link

Suggestion for clarification re: how to compare your fuzzer against RESTler #823

Open vatlidak opened 8 months ago

vatlidak commented 8 months ago

Description

Hello to the RESTler family,

It has come to my attention that the vast majority of API fuzzers that compare against RESTler present experimental results that include comparison only against our BFS test generation strategy.

I think it's worth adding a sentence (next to our bibtex citation pointer) noting that "RESTler includes multiple test generation strategies and in order to get a comprehensive comparative view w.r.t. to (i) efficiency (i.e., how quickly can RESTler find crashes) and (ii) effectiveness (i.e., how many crashes can RESTler find in a give time frame), we recommend to compare against all documented ``fuzzing_mode(s)'' because each one provides a different trade-off between breadth and depth of state space exploration."

Thanks,

Vaggelis

marina-p commented 8 months ago

Hello @vatlidak,

Thank you for raising this. We noticed this as well! The clarifying sentence is a good idea, will add this.

There's a related usability issue - users want to run one 'fuzz' mode and get the results, not have to try different/all fuzzing strategies separately. Currently there is a default strategy bfs-fast in fuzz mode (which causes confusion since it may never test some of the values provided in the dictionary even in very long runs). Perhaps there should be a balance of different strategies when none are specified by default.

Thanks,

Marina

marina-p commented 8 months ago

CC @wilbaker

vatlidak commented 8 months ago

If I would have to choose one mode as default (assuming people will blindly compare), I think bfs-cheap offers the best tradeoff between depth and breadth, and I would also add that the --test mode should be used as a first dictionary "validation" step to make sure that the dictionary has at least one useful value for each request; otherwise, any comparison is basically not too representative 🤷

On Thu, Oct 12, 2023, 11:26 PM marina-p @.***> wrote:

CC @wilbaker https://github.com/wilbaker

— Reply to this email directly, view it on GitHub https://github.com/microsoft/restler-fuzzer/issues/823#issuecomment-1760320234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARVYY4X7EEL4QARITZQSVTX7BHAXAVCNFSM6AAAAAA55HJUNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRQGMZDAMRTGQ . You are receiving this because you were mentioned.Message ID: @.***>