mp very slow with conv layers

bjuergens commented 3 years ago

using MP together with Torch-CNNs is significant slower than with Dask

bjuergens commented 3 years ago

here are 4 experiments with CNN and procgen. Two are executed with dask and two with MP. The Dask experiments were done in 7.5 seconds. The two MP experiments were done in 120s.

$ python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp; python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp 
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000    ctrnn_size: 1337    cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0           0           0       
Time elapsed: 7.656079053878784
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-00-55
Done
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000    ctrnn_size: 1337    cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0           0           0       
Time elapsed: 7.405250549316406
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-01-07
Done
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000    ctrnn_size: 1337    cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0.446667    0.510381    2       
Time elapsed: 122.9413092136383
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-01-19
Done
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000    ctrnn_size: 1337    cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0.446667    0.510381    2       
Time elapsed: 119.80066466331482
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-03-24
Done

config:

{
  "environment": "procgen:procgen-heist-v0",
  "random_seed": 123,
  "number_generations": 2,
  "optimizer": {
    "type": "MU_ES",
    "hof_size": 10,
    "checkpoint_frequency": 0,
    "initial_gene_range": 2,
    "tournsize": 0,
    "mu": 50,
    "extra_from_hof": 1,
    "lambda_": 100,
    "mutpb": 0.8,
    "efficiency_weight": 0.0,
    "fix_seed_for_generation": true,
    "strategy_parameter_per_gene": false
  },
  "brain": {
    "type": "CNN_CTRNN",
    "normalize_input": false,
    "normalize_input_target": 0.0,
    "use_bias": false,
    "cnn_conf": {
      "type": "CNN",
      "normalize_input": false,
      "normalize_input_target": 0.0,
      "use_bias": false,
      "conv_size1": 5,
      "conv_feat1": 5,
      "maxp_size1": 4,
      "maxp_stride1": 4,
      "conv_size2": 5,
      "conv_feat2": 5,
      "maxp_size2": 4,
      "maxp_stride2": 1
    },
    "ctrnn_conf": {
      "type": "CTRNN",
      "number_neurons": 15,
      "neuron_activation": "relu",
      "neuron_activation_inplace": false,
      "use_bias": true,
      "delta_t": 0.05,
      "normalize_input": false,
      "normalize_input_target": 2,
      "optimize_state_boundaries": "global",
      "clipping_range_max": 1.0,
      "clipping_range_min": -1.0,
      "optimize_y0": false,
      "set_principle_diagonal_elements_of_W_negative": false,
      "parameter_perturbations": 0.0,
      "w_mask": "logarithmic",
      "w_mask_param": 8,
      "v_mask": "logarithmic",
      "v_mask_param": 8,
      "t_mask": "logarithmic",
      "t_mask_param": 4
    }
  },
  "episode_runner": {
    "number_fitness_runs": 1,
    "reuse_env": true,
    "max_steps_per_run": 10,
    "keep_env_seed_fixed_during_generation": true,
    "environment_attributes": {
      "type": "ProcGenAttr"
    }
  }
}

bjuergens commented 3 years ago

here are for more experiments, but this time without CNN. The Dask experiments were done in 6.5 seconds. The MP experiments were done in 2.5 seconds

$ python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp; python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0.34        0.473709    1       
Time elapsed: 6.655607461929321
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-08
Done
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0.00666667  0.081377    1       
Time elapsed: 6.674955368041992
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-19
Done
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0           0           0       
Time elapsed: 2.4488930702209473
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-30
Done
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       150     1650        0           0           0           0       
1       150     1650        0           0           0           0       
Time elapsed: 2.4085118770599365
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-35
Done

{
  "environment": "procgen:procgen-heist-v0",
  "random_seed": 123,
  "number_generations": 2,
  "optimizer": {
    "type": "MU_ES",
    "hof_size": 10,
    "checkpoint_frequency": 0,
    "initial_gene_range": 2,
    "tournsize": 0,
    "mu": 50,
    "extra_from_hof": 1,
    "lambda_": 100,
    "mutpb": 0.8,
    "efficiency_weight": 0.0,
    "fix_seed_for_generation": true,
    "strategy_parameter_per_gene": false
  }, "brain": {
  "type": "CTRNN",
  "number_neurons": 30,
  "neuron_activation": "relu",
  "neuron_activation_inplace": false,
  "use_bias": true,
  "delta_t": 0.05,
  "normalize_input": false,
  "normalize_input_target": 2,
  "optimize_state_boundaries": "global",
  "clipping_range_max": 1.0,
  "clipping_range_min": -1.0,
  "optimize_y0": false,
  "set_principle_diagonal_elements_of_W_negative": false,
  "parameter_perturbations": 0.0,
  "w_mask": "logarithmic",
  "w_mask_param": 32,
  "v_mask": "logarithmic",
  "v_mask_param": 8,
  "t_mask": "logarithmic",
  "t_mask_param": 4
},
  "episode_runner": {
    "number_fitness_runs": 1,
    "reuse_env": true,
    "max_steps_per_run": 10,
    "keep_env_seed_fixed_during_generation": true,
    "environment_attributes": {
      "type": "ProcGenAttr",
      "screen_size": 16

    }
  }
}

pdeubel commented 3 years ago

Ok thanks for the infos, I will investigate 👍🏽

pdeubel commented 3 years ago

Okay I think i found the problem: When running with -p mp n 1 i.e., multiprocessing with one worker it is even faster than Dask. I also printed out

print("Torch Num Threads {} Torch Interop Threads {}".format(torch.get_num_threads(), torch.get_num_interop_threads()))

in the CNN Constructor. Both values where 4. Seems like the settings to set torch threads and interop threads to 1 is not executed inside the worker threads. A simple fix is to do it outside of the CNN class. The interop threads can only be set once as per documentation so I would not do it inside the constructor. I will push a quick fix on the procgen2 Branch.

bjuergens commented 3 years ago

Thank you :heart:

bjuergens commented 3 years ago

läuft. Ich habe gerade folgenden test gemacht:

console:

(neuro) chef on dekstop in ~/dev/project/NeuroEvolution-CTRNN_new(3h3m|procgen2)
$ python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp; python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       110     55550       0           0.0527273   0.0921551   0.4     
1       110     55550       0           0.232727    0.213247    0.8     
2       110     55550       0           0.0909091   0.144314    0.4     
3       110     55550       0           0.185455    0.239255    0.6     
4       110     55550       0           0.0327273   0.0739891   0.2     
5       110     55550       0           0.292727    0.186211    0.6     
6       110     55550       0           0.278182    0.19277     0.6     
7       110     55550       0           0.105455    0.172066    0.4     
8       110     55550       0           0.150909    0.0901972   0.4     
9       110     55550       0           0.0290909   0.0705117   0.2     
Time elapsed: 128.69662761688232
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_15-22-15
Done
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
                                                    fitness                         
                            --------------------------------------------------------
gen     nevals  steps       min         avg         std         max     
0       110     55550       0           0.0745455   0.100396    0.4     
1       110     55550       0           0.450909    0.255038    0.8     
2       110     55550       0           0.209091    0.109167    0.6     
3       110     55550       0           0.109091    0.103173    0.4     
4       110     55550       0           0.123636    0.134791    0.6     
5       110     55550       0           0.209091    0.229444    1       
6       110     55550       0           0.403636    0.342166    1.4     
7       110     55550       0           0.203636    0.0971665   0.6     
8       110     55550       0           0.0872727   0.197304    0.8     
9       110     55550       0           0.321818    0.286795    1.2     
Time elapsed: 138.29078269004822
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_15-24-26
Done

config:

{
  "environment": "procgen:procgen-heist-v0",
  "random_seed": -1,
  "number_generations": 10,
  "optimizer": {
    "type": "MU_ES",
    "hof_size": 10,
    "checkpoint_frequency": 0,
    "initial_gene_range": 2,
    "tournsize": 0,
    "mu": 10,
    "extra_from_hof": 1,
    "lambda_": 100,
    "mutpb": 0.8,
    "efficiency_weight": 0.0,
    "fix_seed_for_generation": true,
    "strategy_parameter_per_gene": false
  },
  "brain": {
    "type": "CNN_CTRNN",
    "normalize_input": false,
    "normalize_input_target": 0.0,
    "use_bias": false,
    "cnn_conf": {
      "type": "CNN",
      "normalize_input": false,
      "normalize_input_target": 0.0,
      "use_bias": false,
      "conv_size1": 5,
      "conv_feat1": 5,
      "maxp_size1": 4,
      "maxp_stride1": 4,
      "conv_size2": 5,
      "conv_feat2": 5,
      "maxp_size2": 4,
      "maxp_stride2": 1
    },
    "ctrnn_conf": {
      "type": "CTRNN",
      "number_neurons": 15,
      "neuron_activation": "relu",
      "neuron_activation_inplace": false,
      "use_bias": true,
      "delta_t": 0.05,
      "normalize_input": false,
      "normalize_input_target": 2,
      "optimize_state_boundaries": "global",
      "clipping_range_max": 1.0,
      "clipping_range_min": -1.0,
      "optimize_y0": false,
      "set_principle_diagonal_elements_of_W_negative": false,
      "parameter_perturbations": 0.0,
      "w_mask": "logarithmic",
      "w_mask_param": 8,
      "v_mask": "logarithmic",
      "v_mask_param": 8,
      "t_mask": "logarithmic",
      "t_mask_param": 4
    }
  },
  "episode_runner": {
    "number_fitness_runs": 5,
    "max_steps_per_run": 100,
    "reuse_env": false,
    "keep_env_seed_fixed_during_generation": true,
    "environment_attributes": {
      "type": "ProcGenAttr"
    }
  }
}

bjuergens commented 3 years ago

habe gerade einen doofen bug gefunden;

durch das logging.debug("Setting number of Torch threads and interop threads to 1.") geht das gesamte logging kaputt lol. Dadurch, dass das beim module import aufgerufen wird, wird es ausgeführt bevor logging.basicConfig zum ersten mal aufgerufen wird. Und dadurch geht wohl alles kaputt.

Kurzes Beispiel zum selbst testen:

$ python -c 'import logging; logging.info(123);logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.INFO); logging.info(123)'
--> Keine Ausgabe

$ python -c 'import logging; logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.INFO); logging.info(123)'
--> Ausgabe: INFO: 123

Der unterschied zwischen den beiden zeilen ist, dass in der ersten zeile ein zusätzliches logging.info drin steht, was scheinbar verhindert, dass das andere logging.info eine ausgabe erzeugt

bjuergens commented 3 years ago

hab's jetzt einfach mal korrigiert, indem ich das logging entfernt habe

neuroevolution-ai / NeuroEvolution-CTRNN_new

mp very slow with conv layers #50