lascavana / rl2branch

15 stars 12 forks source link

reproduce results #2

Closed dmitrySorokin closed 4 weeks ago

dmitrySorokin commented 1 year ago

Hi @lascavana! Thanks for the interesting work. I am trying to reproduce results from your paper. I generated instances for the following tasks:

Running pretrained models on the generated instances I've got significantly different numbers (geom mean +- std):

Model Comb. Auct. Max.Ind.Set
internal:relpscost 15.71 +- 139.55 17.65 +- 163.68
gcnn:il 75.43 +- 144.44 35.76 +- 129.91
gcnn:mdp 123.71 +- 320.29 86.84 +- 355.19
gcnn:tmdp+DFS 122.15 +- 311.66 82.01 +- 350.86
gcnn:tmdp+ObjLim 123.00 +- 314.95 82.12 +- 317.59

Screenshot from 2023-02-02 12-52-43

Could you please help me to understand the reason. I am using SCIP version 8.0.3, ecole 0.8.1,

lascavana commented 1 year ago

Hi @dmitrySorokin, I see two reasons for this. One is that for our experiments we used SCIP version 7.0.3. The second one is that it is highly possible that the test instances you are using are not the exact same, which would be because of Ecole versions. We used a custom version of ecole, and I still need to update the installation instructions for this. I apologise for my delay in doing so.

dmitrySorokin commented 1 year ago

Thanks for the answer!

I tried to generate different (40) instances for the combinatorial auction task using your code with different seeds (0, 123, 456) and got the following evaluation results:

name tot geomean std
gcnn:il 200 61.04 14
gcnn:tmdp+ObjLim 200 99.39 17
internal:relpscost 200 8.87 26
name tot geomean std
gcnn:il 200 63.06 12
gcnn:tmdp+ObjLim 200 110.74 18
internal:relpscost 200 10.08 35
name tot geomean std
gcnn:il 200 56.46 14
gcnn:tmdp+ObjLim 200 91.89 17
internal:relpscost 200 6.56 32

std's are calculated according to the following description in paper Gasse "Exact Combinatorial Optimization with Graph Convolutional Neural Networks": "Note that we also report the average per-instance standard deviation, so “64 ± 13.6% nodes” means it took on average 64 nodes to solve an instance, and when solving one of those instances the number of nodes varied by 13.6% on average."

Maybe I am getting different results due to small set of evaluation instances.

lascavana commented 1 year ago

It is normal to get very different results with different instances. Even if the "difficulty" is the same (as measured per size), there is still a high variance in how many nodes you will need to solve a MIP. However, the relative performance of the methods should be roughly the same. This is the most important metric.

dmitrySorokin commented 1 year ago

Thanks!

dmitrySorokin commented 1 year ago

Can you share test and transfer instances for evaluation?

lascavana commented 1 year ago

I just realized that we did not use the ecole generators for this project, so ecole version should be irrelevant to 01_generate_instances.py. We used the default seed. I attach the fist test instance of indset, can you check that you obtain the same? instance_1.txt

dmitrySorokin commented 1 year ago

Yes, the instance is the same. So the difference is probably due to different SCIP versions