samyzaf / tdfmaze

Tour De Flags Maze solved by deep reinforcement learning technique (Q-learning)
7 stars 5 forks source link

Calling a trained model on a new environment #1

Closed amw5g closed 5 years ago

amw5g commented 6 years ago

Hello, and thanks so much for this material. I've got a work problem that seems like an extension of the tour de flags, and I've found your notebooks super helpful as I think through how to tackle it.
I have a bit of an embarrassing questions tho: how do I play a trained model on a new environment? I tried


    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  1.],
])

flags = [(0,4), (0,11), (4,4), (5,9), (11,7), (11,9)]
env_new = Tmaze(maze, flags)
show_env(env_new)

qt_new = Qtraining(
    qt.model,
    env_new,
    #n_epoch = 200,
    #max_memory = 500,
    #data_size = 100,
    #name = 'model_1',    
    weights_file = 'model_t1.h5'
)

qt_new.play()```

But no luck :/ ('Qtraining' object has no attribute 'env_state')
If you could post an example or two of playing to a new environment from a saved model, I'd appreciate it.
Cheers!
jeantimex commented 5 years ago

Hi @amw5g , suppose you have saved the model as model.h5, then you can run the game with the following command:

from tdfmaze import np, TdfMaze, build_model, Qtraining

maze = np.array([
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  0.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.],
    [ 1.,  1.,  1.,  1.,  1.,  1.,  1.],
])

flags = [(3,0), (3,2), (3,4), (3,6)]
env = TdfMaze(maze, flags)

model = build_model(env)

qt = Qtraining(
    model,
    env,
    weights_file = 'model.h5'
)

agent = (0,0)
result = qt.run_game(agent)
print(result)

Also, makes sure the size of the matrix is the same as the one you use for training.