Closed kyungeuuun closed 5 years ago
Hi kyungeuuun, As mentioned in the README, there is a chance that the training loss will explode, the temporary workaround is to restart (from the last saved model before the explosion), or to decrease the learning rate earlier in the learning rate schedule.
Thanks! By the way, are the model parameters set correctly?
The default settings in the config file should be okay.
Thank you for your kindness. I'll try again.
Hello, Li. Though I run 'dcrnn-train.py' with the parameter setup as you mentioned in the paper, I failed to reproduce the best performance. Could you please explain my mistakes or detailed options?
2019-02-22 16:58:40,796 - INFO - Log directory: data/model 2019-02-22 16:58:40,797 - INFO - {'data': {'val_batch_size': 64, 'test_batch_size': 64, 'batch_size': 64, 'graph_pkl_filename': 'data/sensor_graph/dcrnn/adj_mx.pkl', 'dataset_dir': 'data/METR-LA'}, 'model': {'cl_decay_steps': 2000, 'input_dim': 2, 'l1_decay': 0, 'num_rnn_layers': 2, 'num_nodes': 207, 'filter_type': 'dual_random_walk', 'horizon': 12, 'use_curriculum_learning': True, 'seq_len': 12, 'rnn_units': 64, 'output_dim': 1, 'max_diffusion_step': 3}, 'train': {'optimizer': 'adam', 'epsilon': 0.001, 'dropout': 0, 'model_filename': None, 'epochs': 100, 'patience': 50, 'base_lr': 0.01, 'max_grad_norm': 5, 'min_learning_rate': 2e-06, 'global_step': 0, 'max_to_keep': 100, 'lr_decay_ratio': 0.1, 'epoch': 0, 'test_every_n_epochs': 10, 'steps': [20, 30, 40, 50], 'log_dir': 'data/model'}, 'log_level': 'INFO', 'base_dir': 'data/model'} 2019-02-22 16:58:49,720 - INFO - ('x_val', (3425, 12, 207, 2)) 2019-02-22 16:58:49,720 - INFO - ('x_train', (23974, 12, 207, 2)) 2019-02-22 16:58:49,720 - INFO - ('x_test', (6850, 12, 207, 2)) 2019-02-22 16:58:49,720 - INFO - ('y_val', (3425, 12, 207, 2)) 2019-02-22 16:58:49,720 - INFO - ('y_train', (23974, 12, 207, 2)) 2019-02-22 16:58:49,720 - INFO - ('y_test', (6850, 12, 207, 2)) 2019-02-22 16:59:06,917 - INFO - Total number of trainable parameters: 520960 2019-02-22 16:59:09,019 - INFO - Start training ... ... 2019-02-23 04:12:31,358 - INFO - Epoch [89/100] (0) train_mae: 9.8364, val_mae: 12.8458 lr:0.000002 431.2s 2019-02-23 04:13:29,147 - INFO - Horizon 01, MAE: 13.55, MAPE: 0.3397, RMSE: 16.15 2019-02-23 04:13:29,213 - INFO - Horizon 02, MAE: 12.81, MAPE: 0.3336, RMSE: 15.54 2019-02-23 04:13:29,277 - INFO - Horizon 03, MAE: 12.34, MAPE: 0.3307, RMSE: 15.23 2019-02-23 04:13:29,340 - INFO - Horizon 04, MAE: 12.15, MAPE: 0.3311, RMSE: 15.21 2019-02-23 04:13:29,405 - INFO - Horizon 05, MAE: 12.20, MAPE: 0.3341, RMSE: 15.41 2019-02-23 04:13:29,467 - INFO - Horizon 06, MAE: 12.41, MAPE: 0.3385, RMSE: 15.74 2019-02-23 04:13:29,529 - INFO - Horizon 07, MAE: 12.70, MAPE: 0.3432, RMSE: 16.11 2019-02-23 04:13:29,591 - INFO - Horizon 08, MAE: 13.00, MAPE: 0.3476, RMSE: 16.47 2019-02-23 04:13:29,652 - INFO - Horizon 09, MAE: 13.28, MAPE: 0.3512, RMSE: 16.78 2019-02-23 04:13:29,714 - INFO - Horizon 10, MAE: 13.53, MAPE: 0.3540, RMSE: 17.06 2019-02-23 04:13:29,775 - INFO - Horizon 11, MAE: 13.75, MAPE: 0.3562, RMSE: 17.30 2019-02-23 04:13:29,837 - INFO - Horizon 12, MAE: 13.95, MAPE: 0.3582, RMSE: 17.54 2019-02-23 04:20:40,879 - INFO - Epoch [90/100] (0) train_mae: 9.9064, val_mae: 10.6131 lr:0.000002 431.0s 2019-02-23 04:20:40,879 - WARNING - Early stopping at epoch: 90