Benchmark discussion - Githubissues

stanleybak commented 1 year ago

Both tool participants and outsiders such as industry partners can propose benchmarks. All benchmarks must be in .onnx format and use .vnnlib specifications, as was done last year. Each benchmark must also include a script to randomly generate benchmark instances based on a random seed. For image classification, this is used to select the images considered. For other benchmarks, it could, for example, perturb the size of the input set or specification.

The purpose of this thread is present your benchmarks and provide preliminary files to get feedback. Participants can them provide comments, for example, suggesting you to simplify the structure of the network or remove unsupported layers.

To propose a new benchmark, please create a public git repository with all the necessary code. The repository must be structured as follows:

It must contain a generate_properties.py file which accepts the seed as the only command line parameter.
There must be a folder with all .vnnlib files, which may be identical to the folder containing the generate_properties.py file
There must be a folder with all .onnx files, which may be identical to the folder containing the generate_properties.py file
The generate_properties.py file will be run using Python 3.8 on a t2.large AWS instance. (see https://vnncomp.christopher-brix.de/)

Update: benchmark submission deadline extended to June 2 (was ~~May 29~~).

ttj commented 7 months ago

The draft report is here, tool participants and benchmark proposers, please let me know if you need to be added to edit the report, we aim to finalize and post to arxiv by year end, and some text/citations can be provided for each tool and benchmark, ideally by December 15; if you cannot make that deadline, we can update late, but want to get timestamped for 2023 (we mailed the listserv just now); our apologies for slow follow-up on it

https://www.overleaf.com/read/csmzzbmswpwr#60641f

merascu commented 7 months ago

@ttj I observed that Traffic signs recognition benchmark was not mentioned in the report at all, although it was scored and participated in the 2023 competition. Is it a mistake? It did appear in the initial report from here https://github.com/ChristopherBrix/vnncomp2023_results/blob/main/SCORING/latex/main.pdf and here https://github.com/ChristopherBrix/vnncomp2023_results/blob/main/SCORING/slides.pdf. Thanks for the clarifications!

ttj commented 7 months ago

@merascu thanks! we have not copied over the tables from the presentation but will do so soon (there was some final confirmation of things, etc.). If you would like to add some details/citations eg to any papers you have on it/relevant refs, if any for this benchmark in the benchmarks section 4 of the report, please let me know and I'll invite you to edit, otherwise we will add a small bit of text ourselves (e.g. based on the proposal from here https://github.com/stanleybak/vnncomp2023/issues/2#issuecomment-1548081675 ), so that the report is reasonably self-comprehensive around the benchmarks, especially the new ones

merascu commented 7 months ago

Yes, I'd like to add some details. The email address is madalina dot erascu at e-uvt dot ro.

On Sun, 3 Dec 2023 at 04:14, Taylor Johnson @.***> wrote:

@merascu https://github.com/merascu thanks! we have not copied over the tables from the presentation but will do so soon (there was some final confirmation of things, etc.). If you would like to add some details/citations eg to any papers you have on it/relevant refs, if any for this benchmark in the benchmarks section 4 of the report, please let me know and I'll invite you to edit, otherwise we will add a small bit of text ourselves (e.g. based on the proposal from here #2 (comment) https://github.com/stanleybak/vnncomp2023/issues/2#issuecomment-1548081675 ), so that the report is reasonably self-comprehensive around the benchmarks, especially the new ones

— Reply to this email directly, view it on GitHub https://github.com/stanleybak/vnncomp2023/issues/2#issuecomment-1837320559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD74XR7MMAUD22WP7FSJD63YHPVALAVCNFSM6AAAAAAW4EB256VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGMZDANJVHE . You are receiving this because you were mentioned.Message ID: @.***>

merascu commented 6 months ago

@ChristopherBrix @ttj What is the meaning of the 3rd and 5th column in results.csv? Also, what is the role of the nano onnx models resp. vnnlib specs used? Thanks!

mldiego commented 6 months ago

@merascu

These are the columns in results.csv:

About the nano models, I believe these are used as a "smoke test" for each of the tools (I assume you are referring to this test nano

stanleybak / vnncomp2023

Benchmark discussion #2