Open stanleybak opened 1 year ago
The draft report is here, tool participants and benchmark proposers, please let me know if you need to be added to edit the report, we aim to finalize and post to arxiv by year end, and some text/citations can be provided for each tool and benchmark, ideally by December 15; if you cannot make that deadline, we can update late, but want to get timestamped for 2023 (we mailed the listserv just now); our apologies for slow follow-up on it
@ttj I observed that Traffic signs recognition benchmark was not mentioned in the report at all, although it was scored and participated in the 2023 competition. Is it a mistake? It did appear in the initial report from here https://github.com/ChristopherBrix/vnncomp2023_results/blob/main/SCORING/latex/main.pdf and here https://github.com/ChristopherBrix/vnncomp2023_results/blob/main/SCORING/slides.pdf. Thanks for the clarifications!
@merascu thanks! we have not copied over the tables from the presentation but will do so soon (there was some final confirmation of things, etc.). If you would like to add some details/citations eg to any papers you have on it/relevant refs, if any for this benchmark in the benchmarks section 4 of the report, please let me know and I'll invite you to edit, otherwise we will add a small bit of text ourselves (e.g. based on the proposal from here https://github.com/stanleybak/vnncomp2023/issues/2#issuecomment-1548081675 ), so that the report is reasonably self-comprehensive around the benchmarks, especially the new ones
Yes, I'd like to add some details. The email address is madalina dot erascu at e-uvt dot ro.
On Sun, 3 Dec 2023 at 04:14, Taylor Johnson @.***> wrote:
@merascu https://github.com/merascu thanks! we have not copied over the tables from the presentation but will do so soon (there was some final confirmation of things, etc.). If you would like to add some details/citations eg to any papers you have on it/relevant refs, if any for this benchmark in the benchmarks section 4 of the report, please let me know and I'll invite you to edit, otherwise we will add a small bit of text ourselves (e.g. based on the proposal from here #2 (comment) https://github.com/stanleybak/vnncomp2023/issues/2#issuecomment-1548081675 ), so that the report is reasonably self-comprehensive around the benchmarks, especially the new ones
— Reply to this email directly, view it on GitHub https://github.com/stanleybak/vnncomp2023/issues/2#issuecomment-1837320559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD74XR7MMAUD22WP7FSJD63YHPVALAVCNFSM6AAAAAAW4EB256VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGMZDANJVHE . You are receiving this because you were mentioned.Message ID: @.***>
@ChristopherBrix @ttj What is the meaning of the 3rd and 5th column in results.csv? Also, what is the role of the nano onnx models resp. vnnlib specs used? Thanks!
@merascu
These are the columns in results.csv:
Benchmark | neural network (ONNX) | specification (VNNLIB) | time to prepare instance (s) | result (sat/unsat/...) | time to verify (s)
About the nano models, I believe these are used as a "smoke test" for each of the tools (I assume you are referring to this test nano
Both tool participants and outsiders such as industry partners can propose benchmarks. All benchmarks must be in .onnx format and use .vnnlib specifications, as was done last year. Each benchmark must also include a script to randomly generate benchmark instances based on a random seed. For image classification, this is used to select the images considered. For other benchmarks, it could, for example, perturb the size of the input set or specification.
The purpose of this thread is present your benchmarks and provide preliminary files to get feedback. Participants can them provide comments, for example, suggesting you to simplify the structure of the network or remove unsupported layers.
To propose a new benchmark, please create a public git repository with all the necessary code. The repository must be structured as follows:
Update: benchmark submission deadline extended to June 2 (was
May 29).