Closed bjuergens closed 3 years ago
here are 4 experiments with CNN and procgen. Two are executed with dask and two with MP. The Dask experiments were done in 7.5 seconds. The two MP experiments were done in 120s.
$ python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp; python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000 ctrnn_size: 1337 cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0 0 0
Time elapsed: 7.656079053878784
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-00-55
Done
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000 ctrnn_size: 1337 cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0 0 0
Time elapsed: 7.405250549316406
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-01-07
Done
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000 ctrnn_size: 1337 cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0.446667 0.510381 2
Time elapsed: 122.9413092136383
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-01-19
Done
INFO: Input space: Box(0.0, 1.0, (64, 64, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 2337
INFO: cnn_size: 1000 ctrnn_size: 1337 cnn_output: Box(-1.0, 1.0, (8, 8, 5), float32)
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0.446667 0.510381 2
Time elapsed: 119.80066466331482
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-03-24
Done
config:
{
"environment": "procgen:procgen-heist-v0",
"random_seed": 123,
"number_generations": 2,
"optimizer": {
"type": "MU_ES",
"hof_size": 10,
"checkpoint_frequency": 0,
"initial_gene_range": 2,
"tournsize": 0,
"mu": 50,
"extra_from_hof": 1,
"lambda_": 100,
"mutpb": 0.8,
"efficiency_weight": 0.0,
"fix_seed_for_generation": true,
"strategy_parameter_per_gene": false
},
"brain": {
"type": "CNN_CTRNN",
"normalize_input": false,
"normalize_input_target": 0.0,
"use_bias": false,
"cnn_conf": {
"type": "CNN",
"normalize_input": false,
"normalize_input_target": 0.0,
"use_bias": false,
"conv_size1": 5,
"conv_feat1": 5,
"maxp_size1": 4,
"maxp_stride1": 4,
"conv_size2": 5,
"conv_feat2": 5,
"maxp_size2": 4,
"maxp_stride2": 1
},
"ctrnn_conf": {
"type": "CTRNN",
"number_neurons": 15,
"neuron_activation": "relu",
"neuron_activation_inplace": false,
"use_bias": true,
"delta_t": 0.05,
"normalize_input": false,
"normalize_input_target": 2,
"optimize_state_boundaries": "global",
"clipping_range_max": 1.0,
"clipping_range_min": -1.0,
"optimize_y0": false,
"set_principle_diagonal_elements_of_W_negative": false,
"parameter_perturbations": 0.0,
"w_mask": "logarithmic",
"w_mask_param": 8,
"v_mask": "logarithmic",
"v_mask_param": 8,
"t_mask": "logarithmic",
"t_mask_param": 4
}
},
"episode_runner": {
"number_fitness_runs": 1,
"reuse_env": true,
"max_steps_per_run": 10,
"keep_env_seed_fixed_during_generation": true,
"environment_attributes": {
"type": "ProcGenAttr"
}
}
}
here are for more experiments, but this time without CNN. The Dask experiments were done in 6.5 seconds. The MP experiments were done in 2.5 seconds
$ python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask ;python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp; python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0.34 0.473709 1
Time elapsed: 6.655607461929321
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-08
Done
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
INFO: Dask dashboard available at port: 8787
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0.00666667 0.081377 1
Time elapsed: 6.674955368041992
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-19
Done
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0 0 0
Time elapsed: 2.4488930702209473
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-30
Done
INFO: Input space: Box(0.0, 1.0, (16, 16, 3), float16)
INFO: Output space: Discrete(15)
INFO: Individual size for this experiment: 4532
INFO: writing checkpoints to: /home/chef/dev/project/NeuroEvolution-CTRNN_new/checkpoints
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 150 1650 0 0 0 0
1 150 1650 0 0 0 0
Time elapsed: 2.4085118770599365
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_10-24-35
Done
{
"environment": "procgen:procgen-heist-v0",
"random_seed": 123,
"number_generations": 2,
"optimizer": {
"type": "MU_ES",
"hof_size": 10,
"checkpoint_frequency": 0,
"initial_gene_range": 2,
"tournsize": 0,
"mu": 50,
"extra_from_hof": 1,
"lambda_": 100,
"mutpb": 0.8,
"efficiency_weight": 0.0,
"fix_seed_for_generation": true,
"strategy_parameter_per_gene": false
}, "brain": {
"type": "CTRNN",
"number_neurons": 30,
"neuron_activation": "relu",
"neuron_activation_inplace": false,
"use_bias": true,
"delta_t": 0.05,
"normalize_input": false,
"normalize_input_target": 2,
"optimize_state_boundaries": "global",
"clipping_range_max": 1.0,
"clipping_range_min": -1.0,
"optimize_y0": false,
"set_principle_diagonal_elements_of_W_negative": false,
"parameter_perturbations": 0.0,
"w_mask": "logarithmic",
"w_mask_param": 32,
"v_mask": "logarithmic",
"v_mask_param": 8,
"t_mask": "logarithmic",
"t_mask_param": 4
},
"episode_runner": {
"number_fitness_runs": 1,
"reuse_env": true,
"max_steps_per_run": 10,
"keep_env_seed_fixed_during_generation": true,
"environment_attributes": {
"type": "ProcGenAttr",
"screen_size": 16
}
}
}
Ok thanks for the infos, I will investigate 👍🏽
Okay I think i found the problem: When running with -p mp n 1
i.e., multiprocessing with one worker it is even faster than Dask. I also printed out
print("Torch Num Threads {} Torch Interop Threads {}".format(torch.get_num_threads(), torch.get_num_interop_threads()))
in the CNN Constructor. Both values where 4. Seems like the settings to set torch threads and interop threads to 1 is not executed inside the worker threads. A simple fix is to do it outside of the CNN class. The interop threads can only be set once as per documentation so I would not do it inside the constructor. I will push a quick fix on the procgen2
Branch.
Thank you :heart:
läuft. Ich habe gerade folgenden test gemacht:
console:
(neuro) chef on dekstop in ~/dev/project/NeuroEvolution-CTRNN_new(3h3m|procgen2)
$ python neuro_evolution_ctrnn/train.py configurations/temp.json -p mp; python neuro_evolution_ctrnn/train.py configurations/temp.json -p dask
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 110 55550 0 0.0527273 0.0921551 0.4
1 110 55550 0 0.232727 0.213247 0.8
2 110 55550 0 0.0909091 0.144314 0.4
3 110 55550 0 0.185455 0.239255 0.6
4 110 55550 0 0.0327273 0.0739891 0.2
5 110 55550 0 0.292727 0.186211 0.6
6 110 55550 0 0.278182 0.19277 0.6
7 110 55550 0 0.105455 0.172066 0.4
8 110 55550 0 0.150909 0.0901972 0.4
9 110 55550 0 0.0290909 0.0705117 0.2
Time elapsed: 128.69662761688232
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_15-22-15
Done
bokeh.server.util - WARNING - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
fitness
--------------------------------------------------------
gen nevals steps min avg std max
0 110 55550 0 0.0745455 0.100396 0.4
1 110 55550 0 0.450909 0.255038 0.8
2 110 55550 0 0.209091 0.109167 0.6
3 110 55550 0 0.109091 0.103173 0.4
4 110 55550 0 0.123636 0.134791 0.6
5 110 55550 0 0.209091 0.229444 1
6 110 55550 0 0.403636 0.342166 1.4
7 110 55550 0 0.203636 0.0971665 0.6
8 110 55550 0 0.0872727 0.197304 0.8
9 110 55550 0 0.321818 0.286795 1.2
Time elapsed: 138.29078269004822
Output directory: /home/chef/dev/project/CTRNN_Simulation_Results/data/2021-01-21_15-24-26
Done
config:
{
"environment": "procgen:procgen-heist-v0",
"random_seed": -1,
"number_generations": 10,
"optimizer": {
"type": "MU_ES",
"hof_size": 10,
"checkpoint_frequency": 0,
"initial_gene_range": 2,
"tournsize": 0,
"mu": 10,
"extra_from_hof": 1,
"lambda_": 100,
"mutpb": 0.8,
"efficiency_weight": 0.0,
"fix_seed_for_generation": true,
"strategy_parameter_per_gene": false
},
"brain": {
"type": "CNN_CTRNN",
"normalize_input": false,
"normalize_input_target": 0.0,
"use_bias": false,
"cnn_conf": {
"type": "CNN",
"normalize_input": false,
"normalize_input_target": 0.0,
"use_bias": false,
"conv_size1": 5,
"conv_feat1": 5,
"maxp_size1": 4,
"maxp_stride1": 4,
"conv_size2": 5,
"conv_feat2": 5,
"maxp_size2": 4,
"maxp_stride2": 1
},
"ctrnn_conf": {
"type": "CTRNN",
"number_neurons": 15,
"neuron_activation": "relu",
"neuron_activation_inplace": false,
"use_bias": true,
"delta_t": 0.05,
"normalize_input": false,
"normalize_input_target": 2,
"optimize_state_boundaries": "global",
"clipping_range_max": 1.0,
"clipping_range_min": -1.0,
"optimize_y0": false,
"set_principle_diagonal_elements_of_W_negative": false,
"parameter_perturbations": 0.0,
"w_mask": "logarithmic",
"w_mask_param": 8,
"v_mask": "logarithmic",
"v_mask_param": 8,
"t_mask": "logarithmic",
"t_mask_param": 4
}
},
"episode_runner": {
"number_fitness_runs": 5,
"max_steps_per_run": 100,
"reuse_env": false,
"keep_env_seed_fixed_during_generation": true,
"environment_attributes": {
"type": "ProcGenAttr"
}
}
}
habe gerade einen doofen bug gefunden;
durch das logging.debug("Setting number of Torch threads and interop threads to 1.")
geht das gesamte logging kaputt lol. Dadurch, dass das beim module import aufgerufen wird, wird es ausgeführt bevor logging.basicConfig
zum ersten mal aufgerufen wird. Und dadurch geht wohl alles kaputt.
Kurzes Beispiel zum selbst testen:
$ python -c 'import logging; logging.info(123);logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.INFO); logging.info(123)'
--> Keine Ausgabe
$ python -c 'import logging; logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.INFO); logging.info(123)'
--> Ausgabe: INFO: 123
Der unterschied zwischen den beiden zeilen ist, dass in der ersten zeile ein zusätzliches logging.info
drin steht, was scheinbar verhindert, dass das andere logging.info
eine ausgabe erzeugt
hab's jetzt einfach mal korrigiert, indem ich das logging entfernt habe
using MP together with Torch-CNNs is significant slower than with Dask