harvard-edge / airlearning-rl

Reinforcement learning algorithms for Algorithm, policy exploration in Air Learning
40 stars 13 forks source link

Random policy example without TF1 dependency #8

Open aqibsaeed opened 3 years ago

aqibsaeed commented 3 years ago

Hi @srivatsankrishnan,

Is there any example of a random policy to collect data without any dependency on Tensorflow/Keras etc.? As all these frameworks are sort of outdated and there is very little support for them.

I am looking for a way to simply instantiate an environment and collect data with a random policy and afterwards do RL in TF2. Could you please direct me to such an example ? The examples I see under runt_time and test_suites all require TF1/Keras.

Thanks in advance.

srivatsankrishnan commented 3 years ago

What do you mean by a random policy to collect data? Also, what kind of data do you want to collect? For instance, the easiest way to generate random policy is to generate trajectories or change the control API parameters (there are at least 3-4 different APIs to control parameters like position, velocity, pitch, roll, yaw, etc. This doesn't require you to use TF/Keras. https://github.com/harvard-edge/airlearning-rl/blob/master/test_suites/move.py

The examples you see in the runtime and test_suites(except the move.py) are intended to learn a meaningful policy using RL instead of random policy. Also, if you want to use TF2, you need to change the imports and the agent definitions (and the correct version of stable baselines). So most of the framework should ideally abstract that complexity for you.

aqibsaeed commented 3 years ago

Thanks @srivatsankrishnan

What do you mean by a random policy to collect data? Also, what kind of data do you want to collect? I want to collect state, action, and rewards pair via interaction with the environment, e.g., by taking random action (that's what I meant by random policy). I don't see state, action and reward pairs in move.py.

aqibsaeed commented 3 years ago

@srivatsankrishnan any ideas on if it is possible to collect rollouts with a random policy?

srivatsankrishnan commented 3 years ago

Definitely possible! You need to set it up something like this:

def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
    env = gym.make(env_name)
    env.init_again(eval("settings."+difficulty_level+"_range_dic"))

    return env
def foo_random (env, action):
    obs, rewards, dones, info = env.step(action)

You can write your own function to randomly generate actions and pass it to the foo_random method. Note, this is just illustration of how you can do it. You might need to include the right includes and make sure it compiles. The step function should return you the env, rewards, dones (status), and info.

aqibsaeed commented 3 years ago

Thanks @srivatsankrishnan. I have the following script to test out the idea:

import os
import numpy as np
import time
import gym
import gym_airsim
os.sys.path.insert(0, os.path.abspath('./settings_folder'))
import settings

def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
    env = gym.make(env_name)
    env.init_again(eval("settings."+difficulty_level+"_range_dic"))

    return env

def foo_random (env, action):
    obs, rewards, dones, info = env.step(action)
    print(obs, rewards, dones, info)

env = setup()
print(env.action_space.sample())

time.sleep(10)
print("============================== environment ==========================")

for i in range(1000):
    print(i)
    foo_random(env, [1.5+np.random.uniform(), 2.5+np.random.uniform()])

But I notice after establishing connection with AirSim/Unreal it keeps on printing CONNECTED (and some other text) and does not execute foo_random. Am I missing something here?

srivatsankrishnan commented 3 years ago

Hi, Instead of calling foo_random, can you directly call env.step(actions). in side the for loop? Also please post what stdout along with the "CONNECTED" prompt. Not sure if that is the issue here.

aqibsaeed commented 3 years ago

No it does not really work.

[-3.7095962 -4.342524 ]
============================== environment ==========================
0
ENter Step0
------------------------- step failed ----------------  with 'MultirotorState' object has no attribute 'trip_stats'  error
SUCCESS: The process "UE4Editor.exe" with PID 8104 has been terminated.
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
connection not established yet
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
connection not established yet
Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)

Script opens up a new window but the drone stays stationary.

image

aqibsaeed commented 3 years ago

@srivatsankrishnan I would really appreciate any ideas on how to resolve this issue.

qinglong0276 commented 11 months ago

Have these issues been resolved now, @aqibsaeed?

aqibsaeed commented 11 months ago

nope!

qinglong0276 commented 11 months ago

@aqibsaeed I haven't even reached the steps in the picture you posted, and I've been forced to stop due to other issues! I don't know how you got to this point. Envious!