hongzimao / pensieve

Neural Adaptive Video Streaming with Pensieve (SIGCOMM '17)
http://web.mit.edu/pensieve/
MIT License
524 stars 280 forks source link

Some questions on DRL algorithm selection #69

Closed cindyli2012 closed 4 years ago

cindyli2012 commented 5 years ago

I see you compared with offline optimal and other optimal control algorithms. Wondering how close this DRL algorithm can get to optimal if tuned more?

Have you tried openai or other RL algorithms? Any reason you picked a3c for this particular problem?

hongzimao commented 5 years ago

We didn't extensively tune our model in our submission time. After our paper, there are a number of work pushing the performance by using better features/tuning the a3c algorithm better.

Trying other algorithms is definitely interesting. If you get a chance to try things like TRPO/PPO, please let us know how it performs. We used a3c (policy gradient in general) mostly because the underlying model and implementation are clean (details see §4.2 Choice of algorithm).