Hi, I'm curious why you implemented a sampling procedure (line 30 of decision-transformer-master/atari/mingpt/utils.py) instead of directly taking the argmax of the predicted probabilities? If I'm understanding it correctly, in the continuous case in gym, the predicted value is directly used during evaluation without sampling, is that correct?
Hi, I'm curious why you implemented a sampling procedure (line 30 of decision-transformer-master/atari/mingpt/utils.py) instead of directly taking the argmax of the predicted probabilities? If I'm understanding it correctly, in the continuous case in gym, the predicted value is directly used during evaluation without sampling, is that correct?
Looking forward to your reply! Thank you!