Open aqibsaeed opened 3 years ago
What do you mean by a random policy to collect data? Also, what kind of data do you want to collect? For instance, the easiest way to generate random policy is to generate trajectories or change the control API parameters (there are at least 3-4 different APIs to control parameters like position, velocity, pitch, roll, yaw, etc. This doesn't require you to use TF/Keras. https://github.com/harvard-edge/airlearning-rl/blob/master/test_suites/move.py
The examples you see in the runtime
and test_suites
(except the move.py) are intended to learn a meaningful policy using RL instead of random policy. Also, if you want to use TF2, you need to change the imports and the agent definitions (and the correct version of stable baselines). So most of the framework should ideally abstract that complexity for you.
Thanks @srivatsankrishnan
What do you mean by a random policy to collect data? Also, what kind of data do you want to collect?
I want to collect state, action, and rewards pair via interaction with the environment, e.g., by taking random action (that's what I meant by random policy). I don't see state, action and reward pairs in move.py.
@srivatsankrishnan any ideas on if it is possible to collect rollouts with a random policy?
Definitely possible! You need to set it up something like this:
def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
env = gym.make(env_name)
env.init_again(eval("settings."+difficulty_level+"_range_dic"))
return env
def foo_random (env, action):
obs, rewards, dones, info = env.step(action)
You can write your own function to randomly generate actions and pass it to the foo_random method. Note, this is just illustration of how you can do it. You might need to include the right includes and make sure it compiles. The step function should return you the env, rewards, dones (status), and info.
Thanks @srivatsankrishnan. I have the following script to test out the idea:
import os
import numpy as np
import time
import gym
import gym_airsim
os.sys.path.insert(0, os.path.abspath('./settings_folder'))
import settings
def setup(difficulty_level='default', env_name = "AirSimEnv-v42"):
env = gym.make(env_name)
env.init_again(eval("settings."+difficulty_level+"_range_dic"))
return env
def foo_random (env, action):
obs, rewards, dones, info = env.step(action)
print(obs, rewards, dones, info)
env = setup()
print(env.action_space.sample())
time.sleep(10)
print("============================== environment ==========================")
for i in range(1000):
print(i)
foo_random(env, [1.5+np.random.uniform(), 2.5+np.random.uniform()])
But I notice after establishing connection with AirSim/Unreal it keeps on printing CONNECTED (and some other text) and does not execute foo_random
. Am I missing something here?
Hi,
Instead of calling foo_random, can you directly call env.step(actions).
in side the for loop? Also please post what stdout along with the "CONNECTED" prompt. Not sure if that is the issue here.
No it does not really work.
[-3.7095962 -4.342524 ]
============================== environment ==========================
0
ENter Step0
------------------------- step failed ---------------- with 'MultirotorState' object has no attribute 'trip_stats' error
SUCCESS: The process "UE4Editor.exe" with PID 8104 has been terminated.
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3104: WSAECONNREFUSED
connection not established yet
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
WARNING:tornado.general:Connect error on fd 3088: WSAECONNREFUSED
connection not established yet
Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)
Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)
Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)
Connected!
Client Ver:1 (Min Req: 1), Server Ver:1 (Min Req: 1)
Script opens up a new window but the drone stays stationary.
@srivatsankrishnan I would really appreciate any ideas on how to resolve this issue.
Have these issues been resolved now, @aqibsaeed?
nope!
@aqibsaeed I haven't even reached the steps in the picture you posted, and I've been forced to stop due to other issues! I don't know how you got to this point. Envious!
Hi @srivatsankrishnan,
Is there any example of a random policy to collect data without any dependency on Tensorflow/Keras etc.? As all these frameworks are sort of outdated and there is very little support for them.
I am looking for a way to simply instantiate an environment and collect data with a random policy and afterwards do RL in TF2. Could you please direct me to such an example ? The examples I see under
runt_time
andtest_suites
all require TF1/Keras.Thanks in advance.