google / dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
https://github.com/google/dopamine
Apache License 2.0
10.43k stars 1.37k forks source link

Use dopamine as a text classifier #89

Open sengengie opened 5 years ago

sengengie commented 5 years ago

Hello, I work on AI project. I need to create with gym an environment and test it on dopamine with his agent (rainbow, dqn, ...). I already created this environment but cant launch him with Dopamine:

My environment is a text classifier. The AI read a sentence and them she can choose one language in list, so she need to detect language in the current sentence. Every step, she read another sentence and give an answer ect... I give to AI a string value with no static length.

I observed the c51_cartpole.gin and gym_lib.py to try to understand how can add my environment to dopamine project.

Language is name of my model

"gym_lib": CARTPOLE_MIN_VALS = np.array([-2.4, -5., -math.pi/12., -math.pi2.]) CARTPOLE_MAX_VALS = np.array([2.4, 5., math.pi/12., math.pi2.]) LANGUAGE_MIN_VALS = "" LANGUAGE_MAX_VALS = "" ACROBOT_MIN_VALS = np.array([-1., -1., -1., -1., -5., -5.]) ACROBOT_MAX_VALS = np.array([1., 1., 1., 1., 5., 5.]) gin.constant('gym_lib.CARTPOLE_OBSERVATION_SHAPE', (4, 1)) gin.constant('gym_lib.CARTPOLE_OBSERVATION_DTYPE', tf.float32) gin.constant('gym_lib.CARTPOLE_STACK_SIZE', 1) gin.constant('gym_lib.ACROBOT_OBSERVATION_SHAPE', (6, 1)) gin.constant('gym_lib.ACROBOT_OBSERVATION_DTYPE', tf.float32) gin.constant('gym_lib.ACROBOT_STACK_SIZE', 1) gin.constant('gym_lib.LANGUAGE_OBSERVATION_SHAPE', (1, 1)) gin.constant('gym_lib.LANGUAGE_OBSERVATION_DTYPE', tf.string) gin.constant('gym_lib.LANGUAGE_STACK_SIZE', 51)

I try to follow the current organisation but i think i have problem with entry data.

"c51_language.gin"

Hyperparameters for a simple C51-style Language agent. The hyperparameters

chosen achieve reasonable performance.

import dopamine.agents.dqn.dqn_agent import dopamine.agents.rainbow.rainbow_agent import dopamine.discrete_domains.gym_lib import dopamine.discrete_domains.run_experiment import dopamine.replay_memory.prioritized_replay_buffer import gin.tf.external_configurables

RainbowAgent.observation_shape = %gym_lib.LANGUAGE_OBSERVATION_SHAPE RainbowAgent.observation_dtype = %gym_lib.LANGUAGE_OBSERVATION_DTYPE RainbowAgent.stack_size = %gym_lib.LANGUAGE_STACK_SIZE RainbowAgent.network = @gym_lib.language_rainbow_network RainbowAgent.num_atoms = 51 RainbowAgent.vmax = 10. RainbowAgent.gamma = 0.99 RainbowAgent.update_horizon = 1 RainbowAgent.min_replay_history = 500 RainbowAgent.update_period = 4 RainbowAgent.target_update_period = 100 RainbowAgent.epsilon_fn = @dqn_agent.identity_epsilon RainbowAgent.replay_scheme = 'uniform' RainbowAgent.tf_device = '/gpu:0' # use '/cpu:*' for non-GPU version RainbowAgent.optimizer = @tf.train.AdamOptimizer()

tf.train.AdamOptimizer.learning_rate = 0.001 tf.train.AdamOptimizer.epsilon = 0.0003125

create_gym_environment.environment_name = 'language' create_gym_environment.version = 'v0' create_agent.agent_name = 'rainbow' Runner.create_environment_fn = @gym_lib.create_gym_environment Runner.num_iterations = 500 Runner.training_steps = 1000 Runner.evaluation_steps = 1000 Runner.max_steps_per_episode = 200 # Default max episode length.

WrappedPrioritizedReplayBuffer.replay_capacity = 50000 WrappedPrioritizedReplayBuffer.batch_size = 128

Is it possible to add gym environment with string entry in dopamine ?

sturdyplum commented 5 years ago

From what I've seen it should be pretty much possible to add any environment with a discrete action space. If you want to add a custom environment is actually really simple, you can use the gym preprocessing at https://github.com/google/dopamine/blob/76cdae1f858233a8501e2b61095cde54c6f8a214/dopamine/discrete_domains/gym_lib.py#L306

This lets you create any environment you want as long as it has the functions that it needs and the variables that it needs. How you handle the observations and use them to create a set of q values for the actions will also be up to you. You need to create a function that returns a network and pass it to your rainbow agent when you are creating it.