Tool Submission Discussion

stanleybak commented 2 years ago

At this point, you should be updating your tool in order to support quickly verifying as many benchmarks as you can. Note that the benchmarks instances will change based on a new random seed for the final evaluation. We will follow a similar workflow to last year, where tool authors provide shell scripts to install their tool, prepare instances (convert models to a different format, for example), and then finally verify an instance. The detailed instructions for this are available at last year's git repo.

You will be able to run and debug your toolkit on the submitted benchmarks online at this link. There, you first need to register. Your registration will be manually activated by the organizers, you'll receive a notification once that's done. Afterwards, you can login and start with your first submission.

The process is similar to the submission of benchmarks: You need to specify a public git URL and commit hash, as well as the directory that contains your vnncomp scripts (see this repo for detailed instructions on what the scripts should do). You can also specify a post installation script, in case you need to set up any licenses. You can also specify the AWS instance that should be used. Please note that right now only m5.16xlarge is available, the others will be enabled soon. You may use this instance type for debugging even if you plan to use a different one for your final submission.

Once submitted, you're placed in a queue until the chosen AWS instance can be created, at which point your installation and evaluation scripts will be run. You'll see the output of each step and can abort the evaluation early in case there are any issues. Once a submission has terminated, you can use it to populate the submission form for the next iteration, so you don't have to retype everything.

Important: We currently have no limitation on how often you can submit your tool for testing purposes, but will monitor the usage closely and may impose limits if necessary. Please be mindful of the costs (approx. 3$ per hour) each submission incurs. To save costs, you should debug your code locally and then use the website to confirm the results match your expectations.

At some point before the deadline (on or before July 14 AOE), please send an email to brix@cs.rwth-aachen.de, specifying the ID of your successful test run. Also: if licenses are required, please let us know how to acquire them. Should you encounter any problems, please let us know either on GitHub or via email (brix@cs.rwth-aachen.de). We do not currently plan on extending this deadline.

The competition rules including scoring information is available here.

We strongly encourage tool participants to at least register and have some test submissions on the toolkit website well ahead of the deadline (on or before July 14 AOE), especially if you need licenses to be installed which requires manually messaging brix@cs.rwth-aachen.de.

ChristopherBrix commented 2 years ago

In case you need to adapt your installation script to the specific instance that is spawned, you can now opt to include a pause between the main installation and the post-install script. You can also edit this post-install script while the previous steps are already running. So you could e.g. collect some information about the instance in the install script, use it to create a customized license file, and upload that as part of the updated post-install script.

When using this pause, please keep in mind that you need to manually trigger the continuation of the pipeline, and the instance is being paid for but idle.

yodarocks1 commented 2 years ago

@stanleybak @ChristopherBrix A few issues, and one question I've run into while testing VERAPAK:

Issues

There are a few benchmarks in the benchmarks repository that I don't see on the testing website:

reach_prob_density
vggnet16_2022
nn4sys
rl_benchmarks (cartpole?)

On the website, there are three benchmarks listed that do not have files in the benchmarks repository, and subsequently return errored when run with the message ./benchmarks/X/instances.csv file not found, where X is each of:

cartpole
lunarlander
dubins_rejoin_task

The benchmark cifar100_tinyimagenet_resnet has an extra dimension of size -1, assumedly for when storing the entire dataset, which Tensorflow finds issue with for single instances:

Traceback (most recent call last):
  File "main.py", line 97, in main
    setup(config)
  File "main.py", line 42, in setup
    config['label'], num_classes=np.prod(config['output_shape']), dtype=config['output_dtype'])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/np_utils.py", line 74, in to_categorical
    categorical = np.zeros((n, num_classes), dtype=dtype)
ValueError: negative dimensions are not allowed

Is this something to bring up with the benchmark's author, or should I try to find my own workaround for it?

Question

How should the witness (when violated) be presented in the result? In the Rules, I see it should look like:

sat
((X_0 0.0)
 (X_1 0.0)
 ...
 (X_n 0.0)
 (Y_0 0.0)
 (Y_1 0.0)
 ...
 (Y_n 0.0))

However, when compressed for the resulting .csv file, it looks like:

... ,sat((X_00.0)(X_10.0)...(X_n0.0)(Y_00.0)(Y_10.0)(Y_n0.0)), ...

(E.g. see mnist_fc log at https://vnncomp.christopher-brix.de/toolkit/details/243)

Note: I currently (improperly) use violated instead of sat and neglect the Y values in the witness in the examples given. Please excuse that part for now.

ChristopherBrix commented 2 years ago

The list of benchmarks has been updated.

If this is a general problem with TF, it would be great if cifar100_tinyimagenet_resnet could be updated to avoid this issue for everyone.

How should the witness (when violated) be presented in the result?

For the final evaluation, I guess it would be best to have each result in one line, right @stanleybak? This should make parsing the csv file easier. Instead of the space between X/Y and the value, should we use a semicolon or something else? Then the compression to one line doesn't break anything.

stanleybak commented 2 years ago

I would suggest the counterexample does not get put into the results csv file summary. I don't remember, was this a script I had written last year? We probably do want some way to track these counterexamples.

ChristopherBrix commented 2 years ago

Yes, right now I'm reusing the run_single_instance.sh script you wrote last year. But it should be possible to strip the counterexample from the result, I'll look into that.

huanzhang12 commented 2 years ago

@yodarocks1 Could you elaborate a bit more on the extra dimension of size -1 issue? We don't use tensorflow and it is not very clear to me what is the actual issue. Thanks.

@ChristopherBrix @stanleybak Could you help us activate our account (huan@huan-zhang.com)? Thank you.

ChristopherBrix commented 2 years ago

Your account has been activated.

pat676 commented 2 years ago

Hi all,

I was trying to submit our toolkit for testing. However, my understanding of the current pipeline is that the toolkit has to be in a public GitHub repository?

We can unfortunately not publish this version of our toolkit at the moment. A solution could be to provide our GitHub username, email and a GitHub PAT (Personal Access Token) in the submission in order to clone a private repository. However, it would have to be done in such a way that the PAT, is not visible to everyone.

ChristopherBrix commented 2 years ago

This should already work: After creating the PAT, define the git URL as https://[PAT]:x-oauth-basic@github.com/[USER]/[REPOSITORY].git.

The page with project details is only visible to you and the organizers, so other teams will not have access to the PAT. Does that work for you?

ChristopherBrix commented 2 years ago

You can now use all three instance types m5.16xlarge, p3.2xlarge, and g5.8xlarge.

pat676 commented 2 years ago

This should already work: After creating the PAT, define the git URL as https://[PAT]:x-oauth-basic@github.com/[USER]/[REPOSITORY].git.

The page with project details is only visible to you and the organizers, so other teams will not have access to the PAT. Does that work for you?

Great, thank you! I was not aware you could use it like this, I will give it a try.

pat676 commented 2 years ago

Hi @ChristopherBrix,

There seems to be a relatively short timeframe to update the post_installation script before the instance is shut down. It does take some time for us to get the license. Would it be possible to extend the timeframe to e.g. 5 minutes?

ChristopherBrix commented 2 years ago

You can check the box to include a pause after the installation step. Then you can take as long as you'd like. After you're done, you can advance the pipeline manually.

pat676 commented 2 years ago

Sorry, my mistake. I thought it shut down due to a timeout, but it was due to an error at the end of my installation script.

pat676 commented 2 years ago

Hi, our test run with g5.8xlarge has been stuck on "Instance creation" in a queue for the last few hours. This is somewhat problematic since we have the post-installation pause enabled and don't know how long the wait is.

We will disable to the job for tonight, but would it be possible to enable more simultaneous AWS instances now that the deadline is close, so we do not have to wait in queues?

ChristopherBrix commented 2 years ago

Thank you for bringing this to my attention! In fact, this was a bug in my code, and a race condition caused two instances to be created but not assigned to your submission. The pipeline was advanced anyway and failed as no instance was assigned. Then, the two orphaned instances blocked the creation of new ones.

This has been fixed:

The two instance have been shut down
The race condition is removed
Orphaned instances will be detected sooner and terminated automatically, allowing all submissions to advance
A warning was added to the queue counter, advising you to inform us if you seem to get stuck.

We have access to a limited number of instances of each type (e.g. two of type g5.8xlarge), getting those resources was a surprisingly complex journey. Unfortunately, increasing this number won't be easily possible, but I don't really expect this to be a problem if it wasn't for bugs as the one above.

I am sorry for the inconvenience, please be sure to let me know if this or something else should happen again!

pat676 commented 2 years ago

Thanks, @ChristopherBrix, a few more questions:

For the g5.8xlarge I'm getting a RunTime error from PyTorch that: "RuntimeError: Found no NVIDIA driver on your system". Are we running an image that does not have NVIDIA drivers installed, and if so, would it make sense/ be possible to switch to an image with the drivers installed?
For the post-installation script the last character of the script seems to be removed. This is easily fixed by adding a space after the last character, just wanted to let you know.
For the vggnet benchmark, I'm getting: FileNotFoundError: [Errno 2] No such file or directory: './benchmarks/vggnet16_2022/onnx/vgg16-7.onnx'

UPDATE: I tried installing the Nvidia drivers by adding "apt install nvidia-driver-515 nvidia-dkms-515" to the installation script. However, this seems to require a reboot (e.g. nvidia-smi does not work after installation). Does anyone know if it is possible to install Nvidia drivers without a reboot?

ChristopherBrix commented 2 years ago

For the g5.8xlarge I'm getting a RunTime error from PyTorch that: "RuntimeError: Found no NVIDIA driver on your system". Are we running an image that does not have NVIDIA drivers installed, and if so, would it make sense/ be possible to switch to an image with the drivers installed?

I currently use ami-0892d3c7ee96c0bf7, but I'm open to just making this a configurable parameter so everyone can use the setup they prefer. @stanleybak are you ok with that? I'm no AWS expert, but I think this only changes stuff teams could change with their installation script anyway, while avoiding the need for restarts as mentioned by Patrick. I can change that tonight (in approx. 10 hours).

2. For the post-installation script the last character of the script seems to be removed. This is easily fixed by adding a space after the last character, just wanted to let you know.

Thank you, I'll fix that tonight.

3. For the vggnet benchmark, I'm getting: FileNotFoundError: [Errno 2] No such file or directory: './benchmarks/vggnet16_2022/onnx/vgg16-7.onnx'

This should be fixed now.

pat676 commented 2 years ago

If we use different AMIs, the OS/ Linux Kernel version could differ which again could significantly affect runtimes?

An alternative would be to use the same AMI as last year (AWS Deep Learning AMI). This AMI has NVIDIA drivers installed, and everything else should be easy enough to install via the installation script?

ChristopherBrix commented 2 years ago

That would be ami-0addd1a99864f4d42?

pat676 commented 2 years ago

I am actually not sure, there seem to be few options for Deep Learning AMIs. Maybe @stanleybak knows which we used last year? In any case, all of them seem to have NVIDIA drivers, so either should be fine for us.

stanleybak commented 2 years ago

I currently use ami-0892d3c7ee96c0bf7, but I'm open to just making this a configurable parameter so everyone can use the setup they prefer. @stanleybak are you ok with that?

This sounds fine for me to customize this (assuming there's no extra cost). If people really want to optimize the OS/ Linux Kernel version rather than putting effort into their algorithms, then they should be able to, in my opinion. I don't think it will make too much difference, and if it does then we as a community should know about it, right? As mentioned before, last year we used the Deep Learning AMI (maybe ami-0280f3286512b1b99?). Having this as the default rather than one without drivers could be better.

ChristopherBrix commented 2 years ago

Ok, then I'll use that as the default and make it configurable.

As AMIs may incur costs as well (most don't), I'll need to vet them manually, so if a group would like to use a different one, please let me know and I'll add it to the list of options.

Expect those changes to go live tonight (7 hours from now in my time zone), I'll post an update here once it's done.

ChristopherBrix commented 2 years ago

When submitting your tool, you can now choose which ami you want to use. Please let me know if you want to use a different one and I'll add it. That's almost no work for me, I just need to make sure that there are no costs associated.

huanzhang12 commented 2 years ago

@ChristopherBrix Could you add vanilla Ubuntu 22.04 LTS (ami-052efd3df9dad4825)? Thank you.

ChristopherBrix commented 2 years ago

I cannot find that one, when I search for it I get some suggestions, but not this specific one:

Would ami-0d70546e43a941d70 work?

Edit: I went ahead and added ami-0d70546e43a941d70. It's way past midnight for me, so I'll add the other instance type tomorrow if you can give me a pointer how to find it. I hope in the meantime, this is good enough for you to start testing.

huanzhang12 commented 2 years ago

@ChristopherBrix ami-0d70546e43a941d70 should work, thanks! The reason we have different AMIs is because it is region specific.

mnmueller commented 2 years ago

@ChristopherBrix @stanleybak, has a decision been made on how to encode the counterexamples? Especially for some of the inputs (vgg16 e.g. has 3x224x224 = ~150k inputs) writing them all in the same csv file seems impractical.

We could e.g. write them in a separate csv file with the instance name and "_sat.csv" appended.

yodarocks1 commented 2 years ago

@huanzhang12 Sorry for taking so long to get back to you. The issue is that the input and output shapes of the network include a -1 for the batch size. While the inclusion of batch size is supported by almost all functions, a few get annoyed. Including any method of reshaping and therefore removing the -1.

Luckily, the keras to_categorical function is the only function I use that did not support the batch size. It needed the number of output classes, found from the output shape, but the shape of (-1, 100) resulted in -100 classes. Finally realizing this, I simply added np.absolute() in between, giving a shape of (1, 100) with 100 classes, instead of trying to reshape the inputs and outputs to remove the -1 entirely.

So - issue resolved. Thanks for looking into it!

huanzhang12 commented 2 years ago

@huanzhang12 Sorry for taking so long to get back to you. The issue is that the input and output shapes of the network include a -1 for the batch size. While the inclusion of batch size is supported by almost all functions, a few get annoyed. Including any method of reshaping and therefore removing the -1.

Luckily, the keras to_categorical function is the only function I use that did not support the batch size. It needed the number of output classes, found from the output shape, but the shape of (-1, 100) resulted in -100 classes. Finally realizing this, I simply added np.absolute() in between, giving a shape of (1, 100) with 100 classes, instead of trying to reshape the inputs and outputs to remove the -1 entirely.

So - issue resolved. Thanks for looking into it!

Glad to know you solved the issue! Since it is already the final stage of the competition, we will keep the current models.

huanzhang12 commented 2 years ago

@mnmueller @stanleybak @ChristopherBrix I also feel saving large counterexamples directly in the result csv might not be a good idea, it basically makes the csv non-readable and very slow to operate. Given that some benchmarks we have this year have huge input dimensions, it is a good idea to save counterexamples to a separate file, with a name specified by the run script.

stanleybak commented 2 years ago

The logs are saved right? I would suggest to strip the counterexamples when populating the csv file, and just print it to the log in case we want to look it up later.

mnmueller commented 2 years ago

@stanleybak Writing large inputs in human-readable format and double-precision will make them 100s of MB and hard to look at for debugging. Also printing hundreds of thousands of lines might actually be quite slow.

stanleybak commented 2 years ago

Hmm, maybe we just drop them in the csv file and then and redo the computation if discrepancies come up. @ChristopherBrix in the scripts I noticed there was an easy way to just run the first instance from each benchmark... maybe we could add a way to pick the specific instance number we want to support this.

ChristopherBrix commented 2 years ago

I agree, that could be a good solution. I also currently only save a limited amount of logs (1MB?) per benchmark, I didn't realize the files would become that big. They're saved as a string in a database, which is probably not the best format for such huge files. Fixing that is now on my todo list, but I'd rather not commit to fixing this for this years competition.

Running it without logging, and redoing the computation if a conflict is detected seems way easier. And yes, I can adapt my scripts to run a specific instance.

GuantingPan commented 2 years ago

Hi all,

Based on the rules, each tool’s group is allowed to select two benchmarks to be used for scoring. May I know where we can specify the benchmarks we want to use for scoring? Is logging in to this link (https://vnncomp.christopher-brix.de/) the correct way to specify the two benchmarks? But I think we can select more than two benchmarks when we submit our tool by using the link.

Thank you so much for reading this post! :)

haithamkhedr commented 2 years ago

Hi all,

I was wondering if anyone successfully installed gurobi license on an AWS instance. I'm getting an error that the host is not recognized as an academic domain, did anyone face this issue and was able to solve it?. Thanks

pat676 commented 2 years ago

Hi @ChristopherBrix,

The test runs of the benchmarks all end with a run of "test_nano.vnnlib", I'm assuming this is to measure the overhead? Just FYI, the test_nano files are currently not in the benchmark folder and the the last run of the original benchmark (e.g. mnist_fc,./benchmarks/mnist_fc/onnx/mnist-net_256x6.onnx,./benchmarks/mnist_fc/vnnlib/prop_14_0.05.vnnlib) currently seem to be replaced by the nano run.

stanleybak commented 2 years ago

May I know where we can specify the benchmarks we want to use for scoring?

@GuantingPan The organizers will manually contact teams after they finalize their submission to figure out which benchmarks will be scored. Based on the current number of teams that we expect to submit and the number of benchmarks, we expect them all to be scored. We may also include cifar2020 and acasxu from last year, which won't be scored, just to get a year-to-year measurement of improvement.

mnmueller commented 2 years ago

@stanleybak I think the (super) set of benchmarks should be fixed before the final submission, as some teams might have to include configs per benchmark which would mean running last year's benchmarks without prior knowledge will be impossible.

stanleybak commented 2 years ago

@mnmueller last year's benchmarks won't be scored, so if tools don't support reasonable defaults it's okay. This probably affects tool usability somewhat (how would a user know what a good configuration file should be), but they won't be penalized for this in VNNCOMP in terms of scoring.

mnmueller commented 2 years ago

@ChristopherBrix Could we add an option to choose to not run the install_tool.sh as sudo? Setting up conda environments from a script that is run as sudo seems to be tricky.

piyush-J commented 2 years ago

The scoring part in the doc says the following -

Each instance is scored is as follows: Correct unsat: 10 points Correct sat: 10 points Incorrect result: -100 points

So, does it mean that if the output is either timeout, error, or unknown, then we would receive a score of 0?

stanleybak commented 2 years ago

does it mean that if the output is either timeout, error, or unknown, then we would receive a score of 0?

@piyush-J yes

pat676 commented 2 years ago

Hi, have there been any changes to the submission system since this morning? Our test suddenly seems to get stuck at "Post-Installation: Running"

vishnuteja97 commented 2 years ago

Hi, I made three test submissions this morning. The first one executed properly. The second and third one got stuck at "Post-Installation: Running"

pat676 commented 2 years ago

We currently have one that is stuck at "Shutting down instance: Running.". The abort button is disabled at this stage, could one of the organisers please shut it down?

ChristopherBrix commented 2 years ago

I think it's solved now, I'm currently running some tests. Give me a few minutes, I'll post an update soon!

ChristopherBrix commented 2 years ago

@ChristopherBrix Could we add an option to choose to not run the install_tool.sh as sudo? Setting up conda environments from a script that is run as sudo seems to be tricky.

@mnmueller I can do that, but then you cannot install anything that does require root privileges. Maybe you could use su ubuntu -c "whoami; ls /some/dir" in your install script to temporarily switch to a non-root user?

ChristopherBrix commented 2 years ago

The post-install bug should be fixed now, please try again and let me know if it still gets stuck.

If you want to use Gurobi and need the academic license, here's how to get that done:

When submitting your toolkit, select the option to include a pause after the install script was run
In your install script, include grbprobe
Connect your private PC to the universities network
Get a fresh academic license key
Wait until grbprobe was run and note down the output
Navigate your browser to the URL https://apps.gurobi.com/keyserver?id=LICENSE_KEY&hostname=HOSTNAME&hostid=HOSTID&username=ubuntu&os=linux&sockets=SOCKETS&localdate=YYYY-MM-DD&version=9 after substituting the placeholders (if you get an error message, double check that you're connected to your university network)

Format the content shown to you: You need to add linebreaks (or open the website's source code, there they are included) and remove the first two lines. It should look like this:

# DO NOT EDIT THIS FILE except as noted
#
# License ID X
# Gurobi license for Your University
ORGANIZATION=Your University
TYPE=ACADEMIC
VERSION=9
HOSTNAME=ip-123456789
HOSTID=1a2b3c
SOCKETS=2
USERNAME=ubuntu
EXPIRATION=yyyy-mm-dd
KEY=XYZ
CKEY=XYZ

Update your post-install script such that it writes this content to a file. Your license file must either be in one of the default locations (e.g. /opt/gurobi/gurobi.lic, note: /opt/gurobi does not exist by default, you first need to create that directory) or you need to set the environment variable GRB_LICENSE_FILE (I recommend using the default path)
Let the pipeline proceed

As always, please let me know if this doesn't work for you.

stanleybak / vnncomp2022

Tool Submission Discussion #3

Issues

Question