Closed Hororohoruru closed 1 year ago
I had a look and saw that all trials have a much higher count for particle reinvigoration at the last time step compared to others
Having high particle reinvigoration could be totally normal. It's a different issue from particle deprivation. If the number keeps growing, the suggests somehow the true observation is less predictable from the current particles. That might be a property of the problem itself.
Since trials are independent (i.e., I am creating a new instance of the problem for every trial and resetting the belief), I wonder if it makes sense to update the belief when the trial ends at all.
Since trials are independent, you only need to make sure the belief is correct at the start of a trial.
That might be a property of the problem itself
If that were the case, what can I do to avoid this occasional particle deprivation? I can circunvent it by just avoiding the final update, but I am curious about potential solutions
Since trials are independent, you only need to make sure the belief is correct at the start of a trial.
Great, I will do that then, thank you!
Hello! While running a POMCP problem, I'm sometimes getting particle deprivation at the end of a trial. It is the same problem as in #32 and #27, so it is a finite-horizon problem that has N states and observations, with N+1 possible actions (i.e. one action per state and the 'wait' action).
Since real observations are coming from data, the observation model is constructing by predicting from this data with a decoder to obtain $p(o|s')$ (the action does not change the observation function as of now). The trial ends when any action other than 'wait' is taken or when the last time-step of a trial is reached without any action. When this happens, a new trial starts, and the transition probability is uniform to all the possible states.
I noticed sometimes I get particle deprivation right after the trial ends. After looking at the code, I saw that particle deprivation can happen if there is an observation that was never anticipated in the tree . I had a look and saw that all trials have a much higher count for particle reinvigoration at the last time step compared to others (n_particles = 1000), for example:
This makes sense to me as the code is sampling an observation at random instead of getting it from data when the trial is going to end. Given this, I can see why usually it gets an observation that has not being simulated a lot in the tree. I get particle deprivation sometimes, so I guess it depends on how the simulation goes and which observations are sampled.
Since trials are independent (i.e., I am creating a new instance of the problem for every trial and resetting the belief), I wonder if it makes sense to update the belief when the trial ends at all. Not doing it would avoid this issue with particle deprivation, if I understood correctly. Even so, I am wondering whether this method of providing observations during the last time step is correct. To clarify, the observation model currently looks like this: