Feature/navigation 1d - Githubissues

Implementation of Navigation1DEnv with "irrelevant information":

Additional state variables that change randomly but do not actually impact the state variables that determine reward
n+1 state variables where n variables are 0-mean random walks and do not impact the reward and 1 state variable that evolves linearly (perhaps with noise) and all the reward depends on it.
For the transition model, it uses one hidden layer neural net (perhaps with linear activation for simplicity) with a single hidden unit (n+1 fan-in weights + 1 bias; 1D hidden layer; n+1 fan-out weights + n+1 biases).

thiagopbueno / model-aware-policy-optimization