Question about training time

sandman commented 6 years ago

Hi, How long does the training process take? I am running the tensorflow cpu version on an i7-4720HQ CPU @ 2.60GHz. The training has been running for a couple of hours now..

~Sandip

Dreamer-hxs commented 6 years ago

@sandman what's the size of your training data?

sandman commented 6 years ago

@Dreamer-hxs I am using all the available training traces from Hongzimao's original link (127 files in total).

Dreamer-hxs commented 6 years ago

@sandman in fact, the author said in his paper "Training a single algorithm required approximately 50,000 iterations, where each iteration took 300 ms and corresponded to 16 agents updating their parameters in parallel ". I trained with 127 files myself, and found it took around 240 ms each iteration(300000 iterations --> about 25 hours). It seems that the data size has little impact on the time/iteration, i don't know the training size the author use. I don't know if the overfitting problem will exist here, I just take it for grant that more iterations, more accuracy.

hongzimao commented 6 years ago

The training time is roughly in the same scale as @Dreamer-hxs's calculation. Two caveats: 1. The paper uses more data than the ones in repo. The small dataset is meant for others to quickly reproduce the first order result and get the sense of learning approach. 2. Through the training the entropy needs to be decayed. You can see explicitly how decaying entropy improves the policy, as we did not automate this process.

hudson-ayers commented 6 years ago

How can we decay the entropy during the training? By simply manually adjusting the paramater in a3c.py over time?

hongzimao commented 6 years ago

Yes, and remember to load the previous trained model to bootstrap.

hudson-ayers commented 6 years ago

We are attempting to train on a different video, so we cannot load the previous model. Do we have to stop the training in order to adjust the paramater in a3c.py, or is that file reloaded over time?

hongzimao commented 6 years ago

If your different video has different number of bitrates, the previous model wouldn't fit. To train a single model that works across multiple videos, please refer to the paper section 4.3 for multi-video training. The code for doing it is in https://github.com/hongzimao/pensieve/tree/master/multi_video_sim.

hudson-ayers commented 6 years ago

For training using the original video, I am still a little unclear on how to adjust the parameter for entropy. Is simply modifying the ENTROPY_WEIGHT parameter while the training is running sufficient, or do I need to stop the training, modify the parameter, and then somehow resume the training?

hongzimao commented 6 years ago

The latter one. And "somehow resume the training", you can modify https://github.com/hongzimao/pensieve/blob/master/sim/multi_agent.py#L34-L35

jlabhishek commented 5 years ago

Hi, I have a query What do we call an iteration here? is it one pass over network traces, one pass(download) over 48 chunks of video or one chunk download. Thanks

hongzimao commented 5 years ago

One pass over 48 chunks. You can check env.py for the condition when the episode ends.

YihuaZou commented 4 years ago

Hi, I have two questions. 1、Does a loop in function central_agent(in the multi_agent.py) mean 16 iteration or 1 iteration? 2、Is it feasible to achieve automatic adjustment of entropy weight, such as entropy_weight = original_weight-decay_factor*iteration? Thanks!

hongzimao commented 4 years ago

Thanks for your questions. 1. I think it meant 1 iteration. 2. Linear decay of entropy looks good, it's what we've been doing for lateral projects too. Example: https://github.com/hongzimao/decima-sim/blob/master/train.py#L397-L398 and https://github.com/hongzimao/decima-sim/blob/master/utils.py#L39-L44

YihuaZou commented 4 years ago

Hi，I have tried the linear decay of entropy weight(3 for the first iteration and linearly decay to 0.01 after 1e5 iterations). After 120000 iterations，I have got a result better than mpc, but it still worst than the model you provided. So I want to know, is this because the training data is not enough（I download the traces you provided in dropbox, and I notice that it is a subset.）, or the training is not enough, or is it other reasons?

hongzimao commented 4 years ago

More training data can help improve the performance. Our paper used a larger dataset to train the agent. This repo initially used a smaller subset of data for others to quickly generate results. You can follow the instructions in https://github.com/hongzimao/pensieve/blob/master/traces/README.md to generate more data (even more data than we used in the paper if you want). However, just using the data provided in dropbox should give you a model comparable with what we provided. Many others have reproduced, or even surpassed, our results just with the existing data.

YihuaZou commented 4 years ago

@hongzimao Thanks for replying.Your reply helps me a lot and I will retry it with different parameter. But I have another question. How to judge that the network has converged? Same as #76, I find that td_loss (policy gradient loss) has large variance, so it is difficult to judge convergence by this parameter. Maybe check the test results?

hongzimao commented 4 years ago

Checking validation reward is a sensible way to check convergence. We also find entropy (you can normalize it with -log(n), where n is the number of actions) a good metric for checking convergence. Hope this helps!

hongzimao / pensieve

Question about training time #46