Hello, I would like to ask a question. The difference between different MDPs in meta-reinforcement learning in the previous papers is that the state transition probability(P) and reward function(R) are different, but in this article, according to the definition of the state, it is obvious that the states(S) of different MDPs are different, here I do not understand. Hope to answer that, thank you.
Hello, I would like to ask a question. The difference between different MDPs in meta-reinforcement learning in the previous papers is that the state transition probability(P) and reward function(R) are different, but in this article, according to the definition of the state, it is obvious that the states(S) of different MDPs are different, here I do not understand. Hope to answer that, thank you.