Open ttj opened 6 months ago
@mldiego
There were no changes to the benchmark in recent weeks. You need to run the code to get the all onnx files. The specifications in this benchmark compare the output value of the neural network with a linear combination of the inputs. Because the current tools accept the specifications only based on the output variables, for this problem, one needs to use skip connection layers to have the input variables available in the output, and then the linear layers are needed to compare the linear combination of the inputs and the output value. For future support, I also added the vnnlib files (ending with "_io.vnnlib") that contain the specification based on the input and output in a joint format. These files can work only with the main onnx files (the original neural network in this problem).
@aliabigdeli
I am just going with the rules here:
The vnnlib files were not provided, and the onnx file provided (1), is not in used for the competition. Based on these rules, this benchmark proposal was incomplete and misleading, as I doubt every participant will install and generate every file from scratch for all proposed benchmarks, hence why the rules for the benchmark proposal.
@mldiego
As you brought up the rules, after running the generate_properties.py
file with a random seed, you will get two folders (onnx and vnnlib). If what you're saying is correct (which I don't think), many benchmarks are not correctly proposed, like the following:
https://github.com/feiyang-cai/cgan_benchmark2023 https://github.com/stanleybak/vggnet16_benchmark2022 ...
thanks everyone!
for https://github.com/dynaroars/vnncomp-benchmark-generation if the issues cannot be resolved very soon, we will likely need to exclude it from scoring, adding other github ids that I think related as no response on it yet: @edwardxu0 @dynaroars @hocdot
we have extended the deadline for tool submission to July 12 (see #3 (comment) )
So is this benchmark going to be excluded, since tomorrow is the deadline and there is no response?
thanks everyone! for https://github.com/dynaroars/vnncomp-benchmark-generation if the issues cannot be resolved very soon, we will likely need to exclude it from scoring, adding other github ids that I think related as no response on it yet: @edwardxu0 @dynaroars @hocdot we have extended the deadline for tool submission to July 12 (see #3 (comment) )
So is this benchmark going to be excluded, since tomorrow is the deadline and there is no response?
It seems like no one has been able to get it added successfully to consider it, so that would be my recommendation, unless anyone disagrees or has been able to use it successfully
Another thing about safenlp:
Safenlp has folders (medical
and ruarobot
) in onnx
and vnnlib
, and files in medical
and ruarobot
may have same filenames (e.g., vnnlib/ruarobot/hyperrectangle_1402.vnnlib
and vnnlib/medical/hyperrectangle_1402.vnnlib
).
The current scripts would just take hyperrectangle_1402.vnnlib
for counterexample saving and ignore ruarobot
or medical
. So the script may not work correctly, as counterexamples may get saved to the same file perturbations_0_hyperrectangle_1402.counterexample
for both instances:
https://github.com/ChristopherBrix/vnncomp2024_benchmarks/blob/main/benchmarks/safenlp/instances.csv#L804
https://github.com/ChristopherBrix/vnncomp2024_benchmarks/blob/main/benchmarks/safenlp/instances.csv#L924,
and we can't tell which instance the counterexample comes from.
@ChristopherBrix The scripts may need to be modified.
@shizhouxing Thank you for pointing that out. I'll adapt the scripts!
Both tool participants and outsiders such as industry partners can propose benchmarks. All benchmarks must be in .onnx format and use .vnnlib specifications, as was done last year. Each benchmark must also include a script to randomly generate benchmark instances based on a random seed. For image classification, this is used to select the images considered. For other benchmarks, it could, for example, perturb the size of the input set or specification.
The purpose of this thread is present your benchmarks and provide preliminary files to get feedback. Participants can then provide comments, for example, suggesting you to simplify the structure of the network or remove unsupported layers.
To propose a new benchmark, please create a public git repository with all the necessary code.
The repository must be structured as follows: