Closed shenyueshi closed 7 years ago
1.1 Yes, the neural network architecture of them is separated. I think you may be able to train the critic network first and then alternate with the actor network, but keep in mind that the value prediction depends on the actor network (i.e., the policy), thus the training should really be concurrent.
1.2. I think the critic network should be “easier" to converge, because it’s kind of supervised. I think you can use the loss to verify the training quality (on testing dataset).
2.1. Mahimahi emulates the real network so we can evaluate the policy using a real video client (e.g., a video play on Google Chrome). env.py is just a simulated environment.
2.2. The simulator is a very simple chunk-level simulator. As you observed, it doesn’t explicitly model TCP. The purpose of this simulator is to have a fast training environment so that the learning agent can experience a diverse set of network scenarios quickly and do the reinforcement learning. The simulator, we think, doesn’t have to have high fieldity, because from a statistical learning perspective, training with more diverse dataset can be a good thing for learning agent to generalize. Nonetheless, I think a more accurate simulation can potentially reduce the bias in the simulator artifact, which we haven’t throughly investigated yet.
2.3. It is sampling an action based on the distribution (act_prob) of all actions.
2.4. In this example, TRAIN_SEQ_LEN is not triggered. Notice that end_of_video (total number of chunk 48) happens before the counter reaches 100. The reason for this is to have more frequent training (e.g., for a 3 hours long video with thousands of chunks).
3.1. You are right. The training is done explicitly through gradient update.
Hope this helps!
Thank you very much, Hongzi. May I confirm with you about one thing?
For your answer
Mahimahi emulates the real network so we can evaluate the policy using a real video client (e.g., a video play on Google Chrome). env.py is just a simulated environment.
about
_In env.py at line 26-27, self.all_cooked_time = all_cooked_time self.all_cooked_bw = all_cookedbw The input is a long array of network bandwidth at certain time interval. If this is already the data structure in the FCC or 3G/HSDPA dataset, then what role does Mahimahi plays in the training?
Is the output of Mahimahi all_cooked_time and all_cooked_bw based on the input of FCC/3G dataset, or all_cooked_time and all_cooked_bw are already the data in FCC/3G dataset, or all_cooked_time and all_cooked_bw are the simulated data you created yourself?
I think I need to understand how Mahimahi works. One question in my mind is that how it can calculate the download time of a video chunk based on the original trace log. Is it Mahimahi's responsibility, or Mahimahi just generates all_cooked_time/all_cooked_bw and your env.py calculates the download time?
You spot the right functionality of env.py
. However, the sim/
folder is entirely for simulation. It's for training purpose. We did our actual evaluation in run_exp/
and real_exp/
.
More detailed information of Mahimahi: http://mahimahi.mit.edu
In a very high level, Mahimahi is a network emulator. The network communication (both throughput and delay) in/out Mahimahi shell can be emulated by loading network traces. The format of Mahimahi network trace is "delivery opportunity"--interval in milliseconds for the next network package to send.
Hi, Hongzi
Thanks for your reply.
I am still a little confused about how Mahimahi works. From its website and paper, Mahimahi looks like a tool that can record the TCP traffic of an HTTP request and can play out the same TCP traffic for the same HTTP request.
If my above understanding is correct, how does Mahimahi simulate the TCP traffic if the client is not making the exact HTTP request as the one is recorded. Since a new ABR algorithm is about making different quality decisions, is Mahimahi Replayshell still a valid tool for the purpose of network emulation for ABR development?
Mahimahi doesn't just work with HTTP. It "replays" the network throughput in the transport layer. We run applications (in our case, an ABR client in chrome browser) inside Mahimahi shell. Any network traffic across the shell will be modulated by Mahimahi, which paces the network packet following the network trace.
I don't exactly know what you you mean by "how does Mahimahi simulate the TCP traffic if the client is not making the exact HTTP request as the one is recorded". I kind of know your concern vaguely. Mahimahi won't reflect the true response of the network as if the ABR decision were made, it only "replies" the network throughput regardless of the application.
To evaluate the performance for the interaction with real network, we conducted some real experiments. The related code is in real_exp/
.
Thanks, Hongzi. I think we are on the same page.
Hi, Hongzi
If you don't mind, may I ask you a few more questions?
Firstly, I have two general questions:
1) Is the data flow of critic network completely separate from the one of the actor network (I mean the actor network takes the output from the critic network, but the critic network is not affected by the actor network)? If so, would it be okay to train the critic network first, then use the matured critic network to train the actor network?
2) Is the critic network more difficult to converge than the actor network (assuming there's a stable A function existing for the actor network)? Intuitively, it might be hard to predict what will happen in the future based on the current player buffer and the download stats of the past several segments (but I guess that's one of the biggest contributions of your paper). Is there any way to verify the quality of a trained critic model?
I also have a few questions about your source code. I might have missed some explanation in your paper, so please pardon me if some of my questions below are silly.
1) In env.py at line 26-27, self.all_cooked_time = all_cooked_time self.all_cooked_bw = all_cooked_bw The input is a long array of network bandwidth at certain time interval. If this is already the data structure in the FCC or 3G/HSDPA dataset, then what role does Mahimahi plays in the training?
2) In env.py at line 60, while True: # download video chunk over mahimahi the while loop calculates the time needed to download a video chunk, based on the network bandwidth at discrete time points. To my knowledge, the bandwidth fluctuation can be caused by multiple reasons and one thing inevitable thing is browser terminating TCP connection after a few HTTP requests. Sometimes the connection time can be long (a few hundred milliseconds or even longer) and the bandwidth drops significantly for the first segment after the reconnection. Therefore, when the calculation in the while loop at line 60 encounters a low bandwidth value in cooked_bw, could it be caused by a new TCP connection when the trace log was collected in the first place? Could this way of calculating chunk download time not reflect the actual network condition (on the other hand, I don't know what a better way can be)?
2) In agent.py at line 127-129 action_prob = actor.predict(np.reshape(state, (1, S_INFO, S_LEN))) action_cumsum = np.cumsum(action_prob) bit_rate = (action_cumsum > np.random.randint(1, RAND_RANGE) / float(RAND_RANGE)).argmax() This part I don't understand too well. What is the structure of action_prob, an array of probability of all bitrates. I don't quite follow what action_cumsum is and how bit_rate is calculated based on it with some randomness.
3) In agent.py line 145 (TRAIN_SEQ_LEN is 100) if len(r_batch) >= TRAIN_SEQ_LEN or end_of_video: # do training once and line 173 (GRADIENT_BATCH_SIZE is 16) if len(actor_gradient_batch) >= GRADIENT_BATCH_SIZE: This is another part I don't quite understand. We only compute gradients every 100 chunks and evolve the neural networks 16 times in one go? Why do we want to do that, to save computer power or help with convergence?
Also a minor question:
1) In ac3.py at line 83 def train(self, inputs, acts, act_grad_weights): This function is not used anywhere. Is that right?