This got a bit larger as initially aimed for.
To make these changes a bigger refactoring was required to do the Spider Result filtering possible with target specific configurations only applying for findings of the one target.
This refactoring also cleaned up some of the messy parts of the code base:
Removed temporary serialization and deserialization of the results in spider and scanner.
Moved spider and scanner deduplication logic into its own classes.
Raw results were build up by string concatenation to build up a json array this was now replaced by proper serialization.
This changes also enabled to use the proper Zap XML report. Previously the scanner just exported the data in the format it was using internally. The scanner now exports the standard xml report also used by DefectDojo.
The new deduplication features can be enabled by the SECURECODEBOX_REDUCE_SPIDER_RESULT_ON_REST_SCHEMAS target attribute. I choose to prefix it with SECURECODEBOX instead of ZAP as this is a custom feature not included in Zap. Let me know if you agree with that or change it to be prefixed with Zap.
This got a bit larger as initially aimed for. To make these changes a bigger refactoring was required to do the Spider Result filtering possible with target specific configurations only applying for findings of the one target.
This refactoring also cleaned up some of the messy parts of the code base:
The new deduplication features can be enabled by the
SECURECODEBOX_REDUCE_SPIDER_RESULT_ON_REST_SCHEMAS
target attribute. I choose to prefix it withSECURECODEBOX
instead ofZAP
as this is a custom feature not included in Zap. Let me know if you agree with that or change it to be prefixed with Zap.The filtering mechanism will filter urls like the following: https://github.com/secureCodeBox/scanner-webapplication-zap/blob/343107865808d08af340f4a550931827c555c65c/src/test/java/io/securecodebox/zap/service/zap/deduplication/SpiderDuplicateReducerTest.java#L141-L149
This currently only works on get requests. This behavior could be extended on non get requests, but is more tricky because of the request bodies.