Closed qazwsx74269 closed 5 years ago
Hi, dev
mode is simply train
mode with rendering of environment and network parameter update check. Because these two operations are expensive, it will also run slower, and is used only for dev.
Hi,
dev
mode is simplytrain
mode with rendering of environment and network parameter update check. Because these two operations are expensive, it will also run slower, and is used only for dev.
So doesn't RL involve the validation procedure? It will find the best policy during training procedure and we just use the policy on the test dataset to evaluate its generalization ability. Is it?
RL validation/eval is different from supervised learning. It is done online as the policy/network/agent iterates, so evaluation is ran at checkpoints at regular intervals. See https://github.com/kengz/SLM-Lab/blob/master/slm_lab/experiment/control.py#L69-L82 Also note that in RL the training data is also the test data. This is still an area of research.
what on earth is the difference between train mode and dev mode? According to your API, dev mode is just train mode with shorter episodes. But in terms of what I have learnt about, dev mode doesn't involve the update of model parameters. So I am quite confused about this and hope someone can help me with this.