Open harshita1804 opened 4 years ago
They should mean the same thing - i.e., the bitrate from the previous action. We were just using different names at different points during the project development. Sorry for the confusion.
My question was for MPC server, I see there to compute smoothness_diffs, both these values are used which would be equal right?
They should be equal. You can also try logging the values of these variables just to make sure.
I also had an implementation question, I was trying to implement robust mpc, but my mpc always chooses the highest or the lowest bitrate, any ideas on how to solve this issue, I have been stuck with it for the last 2 days. Any suggestion will be highly appreciated
We also provided a robust MPC implementation in https://github.com/hongzimao/pensieve/blob/master/rl_server/robust_mpc_server.py#L200 Perhaps you can check the output from your implementation and compare with ours?
I did actually, so the problem is my algo keeps choosing the highest bitrate, I feel there is something wrong with my reward function
It's likely - I think you have a few viable ideas to try out. Why don't you log the reward from our implementation and compare with yours? robustMPC should be deterministic and all results should be reproducible. Debugging this down might be tedious but doing it right can be interesting.
yeah, it worked actually, I was using the linear reward, but disregarded the fact that my bitrates were in bits/sec which were huge so the reward was very high converting to mbps helped.. thankyou for your time!
What is the difference between the two?... is'nt the last quality proportional to the last bitrate?