linnaeushuang / pensieve-pytorch

MIT License
27 stars 10 forks source link
pensieve pytorch

Pensieve-pytorch

Implementing pensieve using pytorch.

The author of pensieve uses tensorflow to implement it.

Dependencise:

Notes

When I used pytorch to implement it,I found a difference between the tensorflow implementation and the paper.

In hongzimao/pensieve/sim/a3c.py,author used the mean square error of R_batch and criticNetwork_output(value function,in a3c.py line 243).But R_batch is the cumulative rewards in particular episode.In original paper,pensieve should use mean square of *r+\gamma V(s{t+1}) and V(s{t})** (paper,equation-3).

I have no idea how different between that,so I implemented 3 models to verify it:

Train

You can run an example:

    ```
    python pensieve_torch.py --model_type=1
    ```

Results

1a 1b

Figure1:testing average QoE per 100 episodes.

I use testing-function to get average QoE per 100 episodes in training,and train 6 times using seed(42,142,242,342,442,542).

I found a phenomenon that even without critic network,I can get the similar results(see model_0 and model_2).

Branches