Open ahmad-hl opened 2 years ago
multi-video sim is for agents that can generalize to videos with different numbers and different level of bitrate encoding.
It looks to me in the above writing your understanding of the code and the paper is correct.
I have upgraded the code to work on python 3.8
and used cooked_traces
to train the multiagent RL model in sim
dir.
Given that I'm using a computer with 2 GPU and tensorboard to monitor, What is the time required for the model to converge?
How do you know if the model converged?
Can you also explain the main components in the objective function?
# Compute the objective (log action_vector and entropy)
self.obj = tf.reduce_sum(tf.multiply(tf.log(tf.reduce_sum(tf.multiply(self.out, self.acts), axis=1, keepdims=True) - self.act_grad_weights))
+ ENTROPY_WEIGHT * tf.reduce_sum(tf.multiply(self.out, tf.log(self.out + ENTROPY_EPS)))
Thanks again for upgrading the codebase. The training wall time really depends on your physical hardware. You can monitor the learning curve and see when the performance on validation set is stabilized. To determine if the model is converged, you can use some heuristic like relative performance didn't improve much for the past xxx iteration or something. At our time, we just eyeballed it.
The main objective is just the policy gradient expression (the expression after the gradient operator). It's basically log pi_t * (R_t - baseline_t) + entropy regulator, sum over the training batch.
Hope these help.
Dear Hongzi,
I was trying to figure out the matching between the RL agent's state s(t) in the code and the input info in the paper.
Input: After the download of each chunk t, Pensieve’s learning agent takes state inputs
st = (xt, τt, nt, bt ,ct ,lt)
to its neural networks.xt
is the network throughput measurements for the past k video chunks;τt
is the download time of the past k video chunks,nt
is a vector of m available sizes for the next video chunk;bt
is the current buffer level;ct
is the number of chunks remaining in the video; andlt
is the bitrate at which the last chunk was downloaded. First of all, which code package we need to look at, multi-video_sim
orsim
?When I look at
sim
, I see indef agent
that the input state isCould you please illustrates the matching, and the actor & critic networks (figure 5) if possible?