Closed longmalongma closed 3 years ago
Looking at your code I was a little confused why you didn't use top_k and km during the training phase. But top_k and km are used in the evaluation phase, right?Is it bad to use top_k and km in training?
By the way, can the network for evaluation be inconsistent with the network for training?
- It is difficult to optimize with top-k/kernel memory due to sparse gradients.
- Top-k isn't necessary for training (three frames propagation). Check our motivation in the paper.
- What do you mean by inconsistent? Sure they are not the same (can hardly be unless you have tons of training resources and data).
感谢您的回复,那你的训练模型和stm完全一样吗?inconsistent意思是训练网络和评估网络不是一个网络,假如训练网络和评估网络不是一个网络,在评估时调用训练网络训练好的模型的时候不会因为模型不一致报错吗?
Looking at your code I was a little confused why you didn't use top_k and km during the training phase. But top_k and km are used in the evaluation phase, right?Is it bad to use top_k and km in training?