verivital / vnncomp2024

Verification of Neural Networks Competition Main Repository (VNN-COMP'24)
4 stars 0 forks source link

Benchmark Proposals and Discussion #2

Open ttj opened 6 months ago

ttj commented 6 months ago

Both tool participants and outsiders such as industry partners can propose benchmarks. All benchmarks must be in .onnx format and use .vnnlib specifications, as was done last year. Each benchmark must also include a script to randomly generate benchmark instances based on a random seed. For image classification, this is used to select the images considered. For other benchmarks, it could, for example, perturb the size of the input set or specification.

The purpose of this thread is present your benchmarks and provide preliminary files to get feedback. Participants can then provide comments, for example, suggesting you to simplify the structure of the network or remove unsupported layers.

To propose a new benchmark, please create a public git repository with all the necessary code.

The repository must be structured as follows:

aliabigdeli commented 2 months ago

@mldiego

There were no changes to the benchmark in recent weeks. You need to run the code to get the all onnx files. The specifications in this benchmark compare the output value of the neural network with a linear combination of the inputs. Because the current tools accept the specifications only based on the output variables, for this problem, one needs to use skip connection layers to have the input variables available in the output, and then the linear layers are needed to compare the linear combination of the inputs and the output value. For future support, I also added the vnnlib files (ending with "_io.vnnlib") that contain the specification based on the input and output in a joint format. These files can work only with the main onnx files (the original neural network in this problem).

mldiego commented 2 months ago

@aliabigdeli

I am just going with the rules here:

image

The vnnlib files were not provided, and the onnx file provided (1), is not in used for the competition. Based on these rules, this benchmark proposal was incomplete and misleading, as I doubt every participant will install and generate every file from scratch for all proposed benchmarks, hence why the rules for the benchmark proposal.

aliabigdeli commented 2 months ago

@mldiego

As you brought up the rules, after running the generate_properties.py file with a random seed, you will get two folders (onnx and vnnlib). If what you're saying is correct (which I don't think), many benchmarks are not correctly proposed, like the following:

https://github.com/feiyang-cai/cgan_benchmark2023 https://github.com/stanleybak/vggnet16_benchmark2022 ...

shizhouxing commented 2 months ago

thanks everyone!

for https://github.com/dynaroars/vnncomp-benchmark-generation if the issues cannot be resolved very soon, we will likely need to exclude it from scoring, adding other github ids that I think related as no response on it yet: @edwardxu0 @dynaroars @hocdot

we have extended the deadline for tool submission to July 12 (see #3 (comment) )

So is this benchmark going to be excluded, since tomorrow is the deadline and there is no response?

ttj commented 2 months ago

thanks everyone! for https://github.com/dynaroars/vnncomp-benchmark-generation if the issues cannot be resolved very soon, we will likely need to exclude it from scoring, adding other github ids that I think related as no response on it yet: @edwardxu0 @dynaroars @hocdot we have extended the deadline for tool submission to July 12 (see #3 (comment) )

So is this benchmark going to be excluded, since tomorrow is the deadline and there is no response?

It seems like no one has been able to get it added successfully to consider it, so that would be my recommendation, unless anyone disagrees or has been able to use it successfully

shizhouxing commented 2 months ago

Another thing about safenlp:

Safenlp has folders (medical and ruarobot) in onnx and vnnlib, and files in medical and ruarobot may have same filenames (e.g., vnnlib/ruarobot/hyperrectangle_1402.vnnlib and vnnlib/medical/hyperrectangle_1402.vnnlib). The current scripts would just take hyperrectangle_1402.vnnlib for counterexample saving and ignore ruarobot or medical. So the script may not work correctly, as counterexamples may get saved to the same file perturbations_0_hyperrectangle_1402.counterexample for both instances: https://github.com/ChristopherBrix/vnncomp2024_benchmarks/blob/main/benchmarks/safenlp/instances.csv#L804 https://github.com/ChristopherBrix/vnncomp2024_benchmarks/blob/main/benchmarks/safenlp/instances.csv#L924, and we can't tell which instance the counterexample comes from.

@ChristopherBrix The scripts may need to be modified.

ChristopherBrix commented 2 months ago

@shizhouxing Thank you for pointing that out. I'll adapt the scripts!