jolibrain / wheatley

Next-generation scheduling problem solver based on GNNs and Reinforcement Learning
Other
44 stars 11 forks source link

Pretrain : Visdom plots and evaluation metrics #52

Closed pierrot-lc closed 1 year ago

pierrot-lc commented 1 year ago

Use OR-Tools to provide a training set to serve as a base for the policy to imitate. OR-Tools is set to solve the problem using the averagistic strategy.

Example of args to test:

python3 train.py --n_j 4 --n_m 4 --max_n_j 20 --max_n_m 20 --total_timesteps 100000000000 --n_validation_env 50 --fixed_validation --n_steps_episode 160 --n_workers 10 --batch_size 160 --lr 1e-4 --exp_name_appendix pretrain --seed 1 --optimizer adam --target_kl 0.04 --ent_coef 0.05 --n_epochs 20 --device cuda:0 --fe_type dgl --residual_gnn --graph_has_relu --graph_pooling learn --hidden_dim_features_extractor 32 --n_layers_features_extractor 5 --mlp_act gelu --layer_pooling last --n_mlp_layers_features_extractor 1 --n_mlp_layers_actor 1 --n_mlp_layers_critic 1 --hidden_dim_actor 16 --hidden_dim_critic 16 --pretrain --pretrain_num_envs 100 --pretrain_num_eval_envs 10 --pretrain_dataset_generation online --pretrain_prob 0.9 --pretrain_epochs 100 --pretrain_batch_size 128 --pretrain_lr 1e-5 --pretrain_weight_decay 1e-1