numbbo / coco

Numerical Black-Box Optimization Benchmarking Framework
https://numbbo.github.io/coco
Other
261 stars 86 forks source link

Feature Request: Table with Instance Numbers for each Algorithm #1993

Open brockho opened 3 years ago

brockho commented 3 years ago

Right now, the output of the postprocessing does not show which instances are used for the different algorithms (only the number of different instances). It will be therefore nice to have a table where for each algorithm displayed, we see the instances, the algorithm was ran on.

This issue came up during a discussion with @ttusar, @MLopez-Ibanez, and @nikohansen today.

lassefschmidt commented 1 year ago

Just to recap, this issue relates to the postprocessing output (e.g. bbob 2019 in data archives). If you go to the section "Tables for selected targets", there is one table displaying the performance of each algorithm for each dimension/function combination. Within each row (each algorithm/dimension/function combination), there is a column "#succ", showing the number of trials that reached the (final) target. But it is not displayed, on which actual instances these trials were run.

Why is this relevant? In coco, every function (across all suites) is parametrized by the dimension and the instance number (see section 3.1 of benchmarking guidelines introduction). And right now, the postprocessing output does not report the instance numbers on which the algorithm was evaluated.

To fetch the instancenumbers used within a given benchmark (grouped by algorithm), I have written this small function:

def get_instancenumbers(data):
  """ Returns a dictionary with algorithm names as key and the list of 
      instancenumbers on which the respective algorithm has been evaluated.

      This method is meant to be used with an input argument which is a
      :py:class:`DictAlg` with algorithm names as keys and which has list of
      :py:class:`DataSet` instances as values.
  """
  result = {}

  for algo in data.keys():
    if algo not in result:
      result[algo] = set(data[algo][0].instancenumbers)
    for dataset in data[algo]:
      result[algo] = result[algo].union(dataset.instancenumbers)

  return result

This function should be called on a cocopp.pproc.DictAlg class instance. For example:

In:

import cocopp               # see https://pypi.org/project/cocopp
pp_data = cocopp.main('bbob-noisy/2010/*') # will take several minutes to process
get_instancenumbers(pp_data)

Out:

{
     ('1komma2_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('1komma2mir_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('1komma2mirser_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('1komma2ser_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('1komma4_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('1komma4mir_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('1komma4mirser_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('1komma4ser_brockhoff_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('AVGNEWUOA_ros_noisy', '') : {1, 2, 3, 4, 5}
     ('CMAEGS_finck_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('IPOP-ACTCMA-ES_ros_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('IPOP-CMA-ES_ros_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('MOS_torre_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('NEWUOA_ros_noisy', '') : {1, 2, 3, 4, 5}
     ('RCGA_tran_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
     ('SPSA_finck_noisy', '') : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
}

As (1) instance numbers are typically fixed for a year, and (2) we selected our data from a specific year, it is not surprising that in above output nearly all algorithms were evaluated on the same instance numbers.

Remaining questions:

  1. If this code should be integrated, where should it be put? (e.g. pproc.py, the file in which we also store the class definition of DictAlg and DataSet)
  2. Creating the instancenumber tables in the postprocessing output
nikohansen commented 1 year ago

If I am not mistaken, this implementation loses some information which could be relevant: a DataSet may contain several runs on the same instance number. To fix this, you could use the buildin Counter class which can represent a multiset.

re. question 1: The function should probably be a method or property of the DictAlg class.

lassefschmidt commented 1 year ago

Thank you for the quick feedback! We updated the function accordingly, see below:

import collections

def get_instancenumbers(data):
  """ Returns a dictionary with algorithm names as key and a dictionary of  
      instancenumbers as values in which we count the average occurence of each
      instancenumber per problem.

      This method is meant to be used with an input argument which is a
      :py:class:`DictAlg` with algorithm names as keys and which has list of
      :py:class:`DataSet` instances as values.
  """
  result = {}

  for algo in data.keys():
    c = collections.Counter() # initialise counter
    for dataset in data[algo]: # incremement counter based on instancenumbers within each dataset
      c.update(dataset.instancenumbers)
    # only return average count of each instancenumber per problem, not overall count
    c = {instance:round(c[instance] / len(data[algo]), 3) for instance in sorted(c.keys())}
    # save averaged dict
    result[algo] = c

  return result

To account for the possibility that in future the instance numbers a given algorithm was evaluted on might change across each dimension / function combination, we iterate over all DataSet instances of each algorithm. In consequence, we count the occurences of each instancenumber across all algo / dimension / function combinations. To not return these numbers (which depend on number of chosen dimensions and evaluation functions), we average them.

If we run this code on the example in my previous comment, we get the following:

In:

import cocopp               # see https://pypi.org/project/cocopp
pp_data = cocopp.main('bbob-noisy/2010/*') # will take several minutes to process
get_instancenumbers(pp_data)

Out:

{
     ('1komma2_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('1komma2mir_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('1komma2mirser_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('1komma2ser_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('1komma4_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('1komma4mir_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('1komma4mirser_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('1komma4ser_brockhoff_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('AVGNEWUOA_ros_noisy', '') : {1: 3.0, 2: 3.0, 3: 3.0, 4: 3.0, 5: 3.0}
     ('CMAEGS_finck_noisy', '') : {1: 0.989, 2: 0.989, 3: 0.989, 4: 0.983, 5: 0.989, 6: 0.983, 7: 0.994, 8: 0.983, 9: 0.983, 10: 1.0, 11: 1.0, 12: 1.0, 13: 0.983, 14: 0.989, 15: 0.983}
     ('IPOP-ACTCMA-ES_ros_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('IPOP-CMA-ES_ros_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('MOS_torre_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('NEWUOA_ros_noisy', '') : {1: 3.0, 2: 3.0, 3: 3.0, 4: 3.0, 5: 3.0}
     ('RCGA_tran_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
     ('SPSA_finck_noisy', '') : {1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0}
}

Two important things to note:

  1. As @nikohansen requested, we catch now if a DataSet contains several runs on the same instance number (compare to value of key _('AVGNEWUOA_rosnoisy', ''))
  2. Interestingly, one algorithm was not always evaluated on the same instance numbers across all dimension/function combinations (see value of key _('CMAEGS_fincknoisy', ''), where the average occurence of some instance numbers is slightly less than 1). Maybe that's simply because the specific instance was not compatible with the corresponding dimension/function combination, but it definitely surprised me.
nikohansen commented 1 year ago

This suggests to "store" the "raw" results in a class, an instance thereof is the return value for each algorithm, and make averages accessible via a property of this class or its __repr__ method attribute?