Updated 1.dqn for compatability with PyTorch 0.4 and 1.0

Updated for compatibility with latest PyTorch versions. (more thorough than recommendations in #20)
- no longer uses the deprecated "Variable" class
- use of appropriate dtypes
- cpu/gpu agnostic code
- use of tensor.item() for conversion of 0-dimensional tensors to ordinary python numbers
Made changes such that the algorithm more closely matches that in Mnih et al. (2015) and other DQN literature:
- linear epsilon decay
- frame stacking
- training frequency is now once every 4 steps in the environment for Atari env
- option of using Huber loss instead of RMS loss in def compute_td_loss()
Borrowed monitoring wrapper from OpenAI's Baselines to log progress of training.
Modified the wrappers such that it now accommodates stacked frames #9 , and outputs them as a LazyFrames object. Axes of the data is appropriately swapped for PyTorch i.e. (no. of channels)x(breadth)x(height)

higgsfield / RL-Adventure