Open stanleybak opened 1 year ago
I have a concern about the current meta-solver definition.
Many tools leverage bounding packages like auto_lirpa. This tool does network analysis, and is ~20k lines of python
~/auto_LiRPA$ find . -type f -name "*.py" | xargs wc -l | grep total
21223 total
If a tool imports auto_lirpa and is less than <20k lines of code, will it be classified as a meta-solver?
Which tools use auto_lirpa? Alpha-beta crown surely does but any others?
Hello :)
as far as I understood by reading the rules, meta-solvers will be graded on the same criterion as non meta-solvers (number of solved instances, correctness).
We may have different expectations as what a meta-solver should do, and the rules may not reflect that. In our team, we develop a platform that call other tools to check specifications; in particular, we have a language (and interpreter) that is more expressive than SMTLIB. With a bit more effort on our side, we could for instance rephrase complex properties into several simpler properties, and send that to provers that may not be tailored for it yet (one of the use case we consider is fairness, for instance).
Allowing tools to tackle a wider array of properties is, in our opinion, a worthy contribution that is currently not reflected into the rules. It could be a specific evaluation criterion for the meta-solvers category. As far as I understand tools like DNNV / Goose, this criterion could also apply to them.
In the same spirit, I am also curious on the two benchmarks limitations. Is this to allow a fair comparison between tools?
more expressive than SMTLIB. With a bit more effort on our side, we could for instance rephrase complex properties into several simpler properties, and send that to provers that may not be tailored for it yet (one of the use case we consider is fairness, for instance).
to me this sounds like VNNLIB may not be expressive enough for these types of properties. Is that the correct understanding? This does sound neat, although if only your tool can handle such properties there isn't too much to compare against, right?
the two benchmark limitation was to prevent a tool from abusing the way the scoring works to get an advantage. For example, say everyone else proposes a total of 10 benchmarks, and my team was allowed to propose 20 (where my team's tool happened to perform really well). The way the scoring is done, each benchmark is weighed equally so that my team would be the winner. I don't think anyone actually tried to do this, but that was the motivation behind the rule.
@GirardR1006
Allowing tools to tackle a wider array of properties is, in our opinion, a worthy contribution that is currently not reflected into the rules. It could be a specific evaluation criterion for the meta-solvers category. As far as I understand tools like DNNV / Goose, this criterion could also apply to them.
So, we as organizers certainly agree broadening properties, categories, etc., but an issue is for the purpose of this being a competition, who else will be able to support such benchmarks, especially for scoring? This has come up in the past as well with e.g. global robustness properties, synthesis of networks that are guaranteed to satisfy properties instead of checking pre-trained networks, etc. All of these things can be done and I think we as organizers will be supportive of, but someone will have to do some work to make it possible for others to analyze these things easily. If a proposer is volunteering to make it easy for others to do such things (e.g., by supporting the ONNX/VNN-LIB formats through a parser/translator or modifications to runtime/batch execution scripts if needed), then that certainly is encouraged. If a spec requires checking multiple properties, then can look into things to a degree whether existing VNN-LIB restriction possible to support, or generalize runtime specs.
For broader discussions on benchmarking, I will also encourage folks to consider this event, where we have more flexibility on the content:
Would there be any pushback to modify the meta-solver definition to use "neural network verification solvers" instead of "network analysis"?
This would encourage people to use auto_lirpa and dnnv in their tools and not have to worry about getting reclassified.
Would there be any pushback to modify the meta-solver definition to use "neural network verification solvers" instead of "network analysis"?
This would encourage people to use auto_lirpa and dnnv in their tools and not have to worry about getting reclassified.
I think auto_LiRPA itself is also a (incomplete) verification solver.
Would there be any pushback to modify the meta-solver definition to use "neural network verification solvers" instead of "network analysis"? This would encourage people to use auto_lirpa and dnnv in their tools and not have to worry about getting reclassified.
I think auto_LiRPA itself is also a (incomplete) verification solver.
Yeah, that is fair. It seems unlikely we will be able to participate then.
@j29scott
Yeah, that is fair. It seems unlikely we will be able to participate then.
So, we're certainly not trying to discourage people from participating, and we as organizers made a rather arbitrary choice in metrics, etc., and have been corresponding about it amongst ourselves. Again, the primary goal of this event is to help foster the community, with secondary goals around understanding state-of-the-art, enabling interchange of benchmarks, etc.
What definition would be reasonable to allow for participation for you?
The current definition is as follows, and does not prohibit you from participating, only that it would classify certain tools as meta-solvers...
"Meta-solvers: We define a meta-solver as any tool that, on any benchmark instance, executes another team’s tool (at least 50% code overlap) on the parts related to the NN analysis. For example, if a tool implements a heuristic to choose between five different analysis methods, one of which is an existing tool participating in the competition and four of which are new code, the tool is still considered a meta-solver, since for some specific instances it uses an existing tool’s analysis code. Meta solvers are allowed to participate and will be indicated as meta-solvers in the report. These will be scored as any other tool, but prizes for “best tool overall” or per category prizes will not be applicable to tools deemed meta solvers. If more than one meta-solver participates, we will have a meta-solver prize category."
I gather your concern is that you don't plan to participate if your tool is going to be classified as a meta-solver given it would not be able to win things?
Things of course can still be incentivized for meta-solvers, e.g., we can have a "best meta-solver" award too.
Everyone: if there are any final suggestions, proposals, or comments on rules, please make them soon, as we need to finalize things this week. We are already behind the schedule we have posted on the website here:
@j29scott As Taylor mentioned, meta-solvers are certainly allowed to participate. If the outcome you want is that there's a list of tools with scores and yours on top (assuming you win), then that's easy to have in the report. The distinction is only made in terms of awards.
What definition would be reasonable to allow for participation for you?
Hi all, sorry for my very late reply and poor phrasing.
I used some poor phrasing in my reply to @shizhouxing. We will be participating in SOME capacity this cycle. Our group would prefer to compete in the main track if at all possible, even if making substantial revisions to our tool if necessary to meet the definition.
From a main track standpoint, it does seem unlikely we will be able to participate. However, we will submit our solver to the meta-track in that case.
Our team would like to submit a motion to use the following AWS instance: m5.16xlarge (96 threads), this would benefit all teams with approaches using CPU parallelism.
These seem to be more expensive ($4.06 per hour, https://instances.vantage.sh/aws/ec2/m5.24xlarge) then the other instances - any ideas how we can make ensure that the comparison to e.g. GPU instances is still fair?
the motion process is for the other rules and this one I think we can simply form a consensus here to pick the instances.
The two limiting factors for the instances are: (1) fairness, in previous years this meant a CPU and GPU instance that cost roughly the same per hour, (2) availability, although Amazon offers many nice-looking instances, they sometimes do not let us actually spawn them... we've reached out before to gain access but this process was slow. If we decide / propose which instances we want or may want we can test if we can actually use them and report back.
Do the GPU-using teams have specific instances we want to try?
Yes I agree, relative error makes a lot more sense for cases like this.
Hello, could you please inform us about the version of Gurobi Optimizer being used for the competition? Is it the most recent version? Thank you!
You have complete control over the server and can install anything you want. So any version if fine.
Gurobi may require you to set up a license - to this end you can use the post install script: During the installation, log the necessary details, then activate the license manually and enter it in that post install script so it can be set up on the server.
You have complete control over the server and can install anything you want. So any version if fine.
Gurobi may require you to set up a license - to this end you can use the post install script: During the installation, log the necessary details, then activate the license manually and enter it in that post install script so it can be set up on the server.
I see. Thank you for the clarification!
@ChristopherBrix
When I run the following command on the post install script
pipenv run grbgetkey XXXXXXXXXXXXXXXX
I got the following:
Loading .env environment variables...
info : grbgetkey version 9.1.2, build v9.1.2rc0
info : Contacting Gurobi license server...
error : ERROR 303: Ip: 52.37.41.96 host ec2-52-37-41-96.us-west-2.compute.amazonaws.com is not recognized as belonging to an academic domain
info : Is your computer connected to the university network?
info : Your computer must be connected to the university network for the grbgetkey
info : command to work. You can connect your computer either directly or via a VPN
AWS is not connected to the university network, and I don't know how to connect it via a VPN. Could you please tell me how you activate an academic license of Gurobi on the server?
Does last years solution still work? https://github.com/stanleybak/vnncomp2022/issues/3#issuecomment-1183730205
Does last years solution still work? stanleybak/vnncomp2022#3 (comment)
No, it doesn't work for me. I constructed the URL as the following:
https://portal.gurobi.com/keyserver?id=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX&hostname=ip-172-31-20-116&hostid=4a8bc533&username=root&os=linux64&cores=1&sockets=2&localdate=2023-06-26&version=9.1.2
and I got the following respond:
HOSTNAME=ip-172-31-20-116
USERNAME=root
RESULT=502
LICENSEID=2390555
It's good to mention, I also added the cores=1
to the query part of the URL, because without that, at first I got ERRORMSG=Cores are not detected by grbgetkey command
. And I also filled sockets=2
by myself because there is no corresponding field in the respond of grbprobe
command.
Just to make sure I followed the procedure right. I substituted LICENSE_KEY
for 'id' by obtaining "Named-User Academic" license from Gurobi website, is this right?
I just tried it myself - apparently using "root" as the username causes this error. If I use "ubuntu", it works. Are you able to run your tool without root privileges?
Thanks, it works now.
The tentative rules for this year are here. Changes are highlighted in yellow.
Please discuss here any comments or motions about the rules for the competition.
The specific AWS instances are still up in the air, so if you want any changes, please discuss them here.