Suggestion for saving front history in experiment

saviosampaio commented 4 years ago

Hello,

I congratulate everyone involved in this interesting project.

I would like to make a suggestion.

We were trying to save the history of the approximation front so that we could generate graphs of the convergence of the metrics (like Hypervolume, for example).

We verify that there is an Observer to record the front of each generation. We then try to create a JobHist class, copying the Jobs class from lib "https://github.com/jMetal/jMetalPy/blob/master/jmetal/lab/experiment.py", as follows:

class JobHist:

    def __init__(self, algorithm: Algorithm, algorithm_tag: str, problem_tag: str, run: int):
        self.algorithm = algorithm
        self.algorithm_tag = algorithm_tag
        self.problem_tag = problem_tag
        self.run_tag = run

    def execute(self, output_path: str = ''):

        if output_path:
            save_front_history = WriteFrontToFileObserver(output_directory=output_path+'/HIST.{}'.format(self.run_tag))
            self.algorithm.observable.register(save_front_history)

        self.algorithm.run()

        if output_path:
            file_name = os.path.join(output_path, 'FUN.{}.tsv'.format(self.run_tag))
            print_function_values_to_file(self.algorithm.get_result(), filename=file_name)

            file_name = os.path.join(output_path, 'VAR.{}.tsv'.format(self.run_tag))
            print_variables_to_file(self.algorithm.get_result(), filename=file_name)

            file_name = os.path.join(output_path, 'TIME.{}'.format(self.run_tag))
            with open(file_name, 'w+') as of:
                of.write(str(self.algorithm.total_computing_time))

To make it easier to maintain, the Job class could have a parameter to store or not the history of each generation, or just the history for each interval of N=10 (or 20, 30, etc) generations, among other options.

However, we note that the function "generate_summary_from_experiment" expects the folders to be in a specific structure, in which it considers the last folder found as the name of the problem, and the penultimate folder found as the name of the algorithm.

When trying to use this JobHist class that we created, the function "generate_summary_from_experiment" considered that the folder "HIST.0" would be the name of a problem, and that ZDT1 would be the name of the algorithm. And it gave an error because it did not find the PF reference file for the "problem" "HIST.0".

As the algorithm and problem folders are generated as first and second folders after the name of the "output_dir" of the experiment, we saw that it would be enough to change the function "generate_summary_from_experiment" in: [-2:] to [1:3]

    for dirname, _, filenames in os.walk(input_dir):
        for filename in filenames:
            try:
                # Linux filesystem
                #algorithm, problem = dirname.split('/')[-2:]
                algorithm, problem = dirname.split('/')[1:3]
            except ValueError:
                # Windows filesystem
                #algorithm, problem = dirname.split('\\')[-2:]
                algorithm, problem = dirname.split('\\')[1:3]

            if 'HIST.' not in dirname: # <------------ needed
                if 'TIME' in filename:
                    run_tag = [s for s in filename.split('.') if s.isdigit()].pop()

                    with open(os.path.join(dirname, filename), 'r') as content_file:
                        content = content_file.read()

                    with open('QualityIndicatorSummary.csv', 'a+') as of:
                        of.write(','.join([algorithm, problem, run_tag, 'Time', str(content)]))
                        of.write('\n')

                if 'FUN' in filename:

So we can keep a folder "HIST.{RUN}" inside each folder "algorithm / problem" of an experiment.

    * <base_dir>/
      * algorithm_a/
        * problem_a/
          * HIST.0/ (history of RUN 0)
            * FUN.0 (first generation non-dominated front of execution 0)
            * FUN.1 (second generation non-dominated front of execution 0)
            * ...
          * FUN.0.tsv (final non-dominated front of execution 0)
          * HIST.1/ (history of RUN 1)
            * FUN.0 (first generation non-dominated front of execution 1)
            * FUN.1 (second generation non-dominated front of execution 1)
            * ...
          * FUN.1.tsv (final non-dominated front of execution 1)
          * VAR.0.tsv
          * VAR.1.tsv
          * ...

Another need would be after that we can generate the summary of indicators or metrics by RUN, so that we can generate graphs with the convergence of these indicators, presenting the median and the 1st and 3rd quartiles.

With this history of the approximation front of each RUN, we will also be able to generate animations to verify their evolution over the generations.

It is a suggestion.

Thank you very much.

Regards.

saviosampaio commented 4 years ago

Now I understood better why the penultimate and the last part of the directory name were examined. The work path will not always be in the same directory in which you are working. For example, you can specify a folder with a full path, such as "/content/data".

In this case, if a regular expression were used, it would be possible to remove the base folder of the experiment, so that the first and second remaining folders would contain the names of the algorithm and the problem.

For example:

import re

(...)

    for dirname, _, filenames in os.walk(input_dir):
      if 'HIST.' not in dirname:
        dirname2 = re.sub('^'+input_dir+'/', '', dirname)
        for filename in filenames:
            try:
                # Linux filesystem
                #algorithm, problem = dirname.split('/')[-2:]
                algorithm, problem = dirname2.split('/')[0:2]
            except ValueError:
                # Windows filesystem
                #algorithm, problem = dirname.split('\\')[-2:]
                algorithm, problem = dirname2.split('\\')[0:2]

            if 'TIME' in filename:
(...)

benhid commented 4 years ago

Hello Sávio,

Thank you for your suggestion. We can extend the Experiment class to save the history of the approximation front (although it can already be implemented using the existing Observers).

However, the generate_summary_from_experiment() function expects a very “peculiar” folder structure, as you noticed. I think it could be easier if we modify the function to ignore every unexpected folder/file (such as HIST.*) inside the input directory, e.g., using glob.

Let me think about it. I will update this issue as soon as possible!

Thank you for your time, Antonio.

saviosampaio commented 4 years ago

Thank you very much, Antônio. Your welcome.

I am also doing some experiments using jMetalPy. Any news we implement, it will be a pleasure to share with your project.

Once again, congratulations and thank you for the beautiful work.

Regards.

Sávio

jMetal / jMetalPy

Suggestion for saving front history in experiment #84