Open sandman opened 6 years ago
@sandman what's the size of your training data?
@Dreamer-hxs I am using all the available training traces from Hongzimao's original link (127 files in total).
@sandman in fact, the author said in his paper "Training a single algorithm required approximately 50,000 iterations, where each iteration took 300 ms and corresponded to 16 agents updating their parameters in parallel ". I trained with 127 files myself, and found it took around 240 ms each iteration(300000 iterations --> about 25 hours). It seems that the data size has little impact on the time/iteration, i don't know the training size the author use. I don't know if the overfitting problem will exist here, I just take it for grant that more iterations, more accuracy.
The training time is roughly in the same scale as @Dreamer-hxs's calculation. Two caveats: 1. The paper uses more data than the ones in repo. The small dataset is meant for others to quickly reproduce the first order result and get the sense of learning approach. 2. Through the training the entropy needs to be decayed. You can see explicitly how decaying entropy improves the policy, as we did not automate this process.
How can we decay the entropy during the training? By simply manually adjusting the paramater in a3c.py over time?
Yes, and remember to load the previous trained model to bootstrap.
We are attempting to train on a different video, so we cannot load the previous model. Do we have to stop the training in order to adjust the paramater in a3c.py, or is that file reloaded over time?
If your different video has different number of bitrates, the previous model wouldn't fit. To train a single model that works across multiple videos, please refer to the paper section 4.3 for multi-video training. The code for doing it is in https://github.com/hongzimao/pensieve/tree/master/multi_video_sim.
For training using the original video, I am still a little unclear on how to adjust the parameter for entropy. Is simply modifying the ENTROPY_WEIGHT parameter while the training is running sufficient, or do I need to stop the training, modify the parameter, and then somehow resume the training?
The latter one. And "somehow resume the training", you can modify https://github.com/hongzimao/pensieve/blob/master/sim/multi_agent.py#L34-L35
Hi, I have a query What do we call an iteration here? is it one pass over network traces, one pass(download) over 48 chunks of video or one chunk download. Thanks
One pass over 48 chunks. You can check env.py for the condition when the episode ends.
Hi, I have two questions. 1、Does a loop in function central_agent(in the multi_agent.py) mean 16 iteration or 1 iteration? 2、Is it feasible to achieve automatic adjustment of entropy weight, such as entropy_weight = original_weight-decay_factor*iteration? Thanks!
Thanks for your questions. 1. I think it meant 1 iteration. 2. Linear decay of entropy looks good, it's what we've been doing for lateral projects too. Example: https://github.com/hongzimao/decima-sim/blob/master/train.py#L397-L398 and https://github.com/hongzimao/decima-sim/blob/master/utils.py#L39-L44
Hi,I have tried the linear decay of entropy weight(3 for the first iteration and linearly decay to 0.01 after 1e5 iterations). After 120000 iterations,I have got a result better than mpc, but it still worst than the model you provided. So I want to know, is this because the training data is not enough(I download the traces you provided in dropbox, and I notice that it is a subset.), or the training is not enough, or is it other reasons?
More training data can help improve the performance. Our paper used a larger dataset to train the agent. This repo initially used a smaller subset of data for others to quickly generate results. You can follow the instructions in https://github.com/hongzimao/pensieve/blob/master/traces/README.md to generate more data (even more data than we used in the paper if you want). However, just using the data provided in dropbox should give you a model comparable with what we provided. Many others have reproduced, or even surpassed, our results just with the existing data.
@hongzimao Thanks for replying.Your reply helps me a lot and I will retry it with different parameter. But I have another question. How to judge that the network has converged? Same as #76, I find that td_loss (policy gradient loss) has large variance, so it is difficult to judge convergence by this parameter. Maybe check the test results?
Checking validation reward is a sensible way to check convergence. We also find entropy (you can normalize it with -log(n), where n is the number of actions) a good metric for checking convergence. Hope this helps!
Hi, How long does the training process take? I am running the tensorflow cpu version on an i7-4720HQ CPU @ 2.60GHz. The training has been running for a couple of hours now..
~Sandip