reproduce results - Githubissues

dmitrySorokin commented 1 year ago

Hi @lascavana! Thanks for the interesting work. I am trying to reproduce results from your paper. I generated instances for the following tasks:

combinatorial_auction_n_items_100_n_bids_500
maximum_independent_set_n_nodes_500

Running pretrained models on the generated instances I've got significantly different numbers (geom mean +- std):

Model	Comb. Auct.	Max.Ind.Set
internal:relpscost	15.71 +- 139.55	17.65 +- 163.68
gcnn:il	75.43 +- 144.44	35.76 +- 129.91
gcnn:mdp	123.71 +- 320.29	86.84 +- 355.19
gcnn:tmdp+DFS	122.15 +- 311.66	82.01 +- 350.86
gcnn:tmdp+ObjLim	123.00 +- 314.95	82.12 +- 317.59

Screenshot from 2023-02-02 12-52-43

For the Max.Ind.Set means seems to be close.
For the Comb. Auct. means are higher.

Could you please help me to understand the reason. I am using SCIP version 8.0.3, ecole 0.8.1,

lascavana commented 1 year ago

Hi @dmitrySorokin, I see two reasons for this. One is that for our experiments we used SCIP version 7.0.3. The second one is that it is highly possible that the test instances you are using are not the exact same, which would be because of Ecole versions. We used a custom version of ecole, and I still need to update the installation instructions for this. I apologise for my delay in doing so.

dmitrySorokin commented 1 year ago

Thanks for the answer!

I tried to generate different (40) instances for the combinatorial auction task using your code with different seeds (0, 123, 456) and got the following evaluation results:

name	tot	geomean	std
gcnn:il	200	61.04	14
gcnn:tmdp+ObjLim	200	99.39	17
internal:relpscost	200	8.87	26

name	tot	geomean	std
gcnn:il	200	63.06	12
gcnn:tmdp+ObjLim	200	110.74	18
internal:relpscost	200	10.08	35

name	tot	geomean	std
gcnn:il	200	56.46	14
gcnn:tmdp+ObjLim	200	91.89	17
internal:relpscost	200	6.56	32

std's are calculated according to the following description in paper Gasse "Exact Combinatorial Optimization with Graph Convolutional Neural Networks": "Note that we also report the average per-instance standard deviation, so “64 ± 13.6% nodes” means it took on average 64 nodes to solve an instance, and when solving one of those instances the number of nodes varied by 13.6% on average."

Maybe I am getting different results due to small set of evaluation instances.

lascavana commented 1 year ago

It is normal to get very different results with different instances. Even if the "difficulty" is the same (as measured per size), there is still a high variance in how many nodes you will need to solve a MIP. However, the relative performance of the methods should be roughly the same. This is the most important metric.

dmitrySorokin commented 1 year ago

Thanks!

dmitrySorokin commented 1 year ago

Can you share test and transfer instances for evaluation?

lascavana commented 1 year ago

I just realized that we did not use the ecole generators for this project, so ecole version should be irrelevant to 01_generate_instances.py. We used the default seed. I attach the fist test instance of indset, can you check that you obtain the same? instance_1.txt

dmitrySorokin commented 1 year ago

Yes, the instance is the same. So the difference is probably due to different SCIP versions

lascavana / rl2branch

reproduce results #2