madras-simulator / MADRaS

Multi-Agent DRiving Simulator
GNU Affero General Public License v3.0
89 stars 20 forks source link

The experiment not starting properly. #11

Closed rudrasohan closed 5 years ago

rudrasohan commented 5 years ago

I have tried and tested the MadrasEnv in both rllab and baselines. The integration seems fine as I currently see no error. But am experiencing similar problems with both. Even though the agent moves forward the i.e. the network is providing some actions but judging from the output in the terminal it's unlikely that the algorithm is moving forward.

For trying out with baselines: refer #9 For rllab I have created a file which implements TRPO:

from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.gym_env import GymEnv
from rllab.envs.normalized_env import normalize
from rllab.misc.instrument import run_experiment_lite
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy

def run_task(*_):
    # Please note that different environments with different action spaces may
    # require different policies. For example with a Discrete action space, a
    # CategoricalMLPPolicy works, but for a Box action space may need to use
    # a GaussianMLPPolicy (see the trpo_gym_pendulum.py example)
    env = normalize(GymEnv("Madras-v0"))

    #policy = CategoricalMLPPolicy(
    #    env_spec=env.spec,
        # The neural network policy should have two hidden layers, each with 32 hidden units.
    #    hidden_sizes=(32, 32)
    #)

    policy = GaussianMLPPolicy(
    env_spec=env.spec,
    # The neural network policy should have two hidden layers, each with 32 hidden units.
    hidden_sizes=(32, 32)
    )

    baseline = LinearFeatureBaseline(env_spec=env.spec)

    algo = TRPO(
        env=env,
        policy=policy,
        baseline=baseline,
        batch_size=4000,
        max_path_length=env.horizon,
        n_itr=50,
        discount=0.99,
        step_size=0.01,
        # Uncomment both lines (this and the plot parameter below) to enable plotting
        # plot=True,
    )
    algo.train()

run_experiment_lite(
    run_task,
    # Number of parallel workers for sampling
    n_parallel=1,
    # Only keep the snapshot parameters for the last iteration
    snapshot_mode="last",
    # Specifies the seed for the experiment. If this is not provided, a random seed
    # will be used
    seed=1,
    # plot=True,
)