Closed hjj-666 closed 2 years ago
This is a screenshot of another project I am running. The address of the project is https://github.com/ZJU-Robotics-Lab/model-based-social-navigation. When I run its project, I run it in the conda environment Such an error will occur, but no error is reported when it is not running in the conda environment. The general solution on the Internet is to turn off the conda environment. But I can run your project directly in the conda environment, may I ask if your project has had similar problems before to avoid such problems
Seems to me that you are running into problems with dependencies or Python versions. From the output here it seems like you are calling ROS kinetic with python2 but your conda runs python3 and calling conda 3 packages. If you are running Kinetic, you should be using python2. Hard to say what the difference is between my implementation and theirs as I do not not how their code works. Perhaps the explicit call of python3, but I do not know for sure. In any case you should check if you have all packages installed in the conda environment and check if the python versions match throughout your pipeline.
Thank you for your reply, I don't think it should be a problem with the package in conda. I directly import re in conda's python and no error is reported
In that case I would suggest checking if both (python2 and python3) have this package installed and the paths properly set up in conda.
Both python2 and python3 should have this package installed, and the path of conda has not changed as before.
Another question is, have you tested the effect of other reinforcement learning algorithms such as DQN, DDPG, etc. in this simulation
There should be no issue with training other types of architectures with this simulation, as the simulation is there to only gather samples and execute actions. The learning is a separate function from the simulation. Actually, I have tried both (DQN and DDPG) within this framework in one way or another, but the best architecture depends on what you need. DQN gives discreet outputs. You can either map the outputs to velocities within your training function or deal with it directly in the simulation framework. But it is using discreet action space, so I prefer the TD3. DDPG is fine, it uses less resources in the training (because it has one less critic), but it is quite unstable due to overestimating the Q-value. TD3 is essentially designed to deal with this issue and is more or less an updated, more stable and better version of DDPG.
Thanks for your reply, I will also try reinforcement learning algorithms like DDPG later
Hello, may I ask if you have added moving pedestrians in gazebo before, is there anything you can refer to in this project?
There are 2 ways that I can think of off the top of my head that could simulate at least dynamic obstacles in the environment.
Thank you for your reply
Hello, recently I found out that you can use an actor model to create moving pedestrians, but this requires gazebo version 8 or above, so after I installed gazebo8 and ran your project, the problem in the picture appeared, do you have any way to solve
ROS Kinetic supports Gazebo 7. For newer versions of Gazebo I would suggest using a newer version of ROS (Melodic or Noetic), that should solve the issue with unsupported packages and different naming conventions. Seems to me here that Gazebo is looking for something in the TF2 ROS package that is not supported by the TF2 package version that you have installed and you have a version mismatch.
Thank you for your reply, I found some information and saw that some people have successfully installed gazebo8 on ubuntu16.04. If it still doesn't work, I will try to upgrade the ubuntu version to 18.04
Yes, you can probably run gazebo8 on Ubuntu 16, but the issue here is a mismatch between the ROS and gazebo. In any case, there is probably a way to do it by changing packages/editing source code/compiling packages from source, but it most likely will not work "out-of-the-box". If you do not have any very specific reason why you would need to stick with older versions of ROS, the better way is probably to update the ROS pipeline.
Hello, I recently tried to use the DDPG algorithm used in your first version of the paper to compare with the TD3 algorithm, but the DDPG algorithm I modified, after training, the car will appear in the same place to explore and not move forward. Excuse me How to solve this, the link of the DDPG algorithm I refer to is as follows:https://github.com/sfujim/TD3. ddpg.py and ourddpg.py in the link. May I ask how you used the ddpg algorithm to simulate this project?
Sorry, I do not understand what the following means:
after training, the car will appear in the same place to explore and not move forward Can you explain your issue in more detail?
Their DDPG looks fine and should train something, but that is not part of this repo and done by someone else so I wouldn't know about any issues regarding it. I also do not know how you implemented it, so it is impossible to answer such general questions. Please be way more specific.
At present, it has been trained for 150 verification sessions, but it often occurs as shown in this video
Again, I do not know your implementation so I do not know how to solve this for your situation. But I can give a high level overview of what I think is happening here.
It appears to me that the issue is that you have struck a sort of local optimum. You will see that the goal point is located right behind the robot, so what happens is:
You can see that it has been caught in a loop. Since all the decision in the neural network are independent and there is no history, the network has no way of knowing it is stuck or "frozen". So it just repeats the best actions it has calculated so far.
I tried ddpg in three ways as follows. did not produce good results
class Actor(nn.Module):
def __init__(self, state_dim, action_dim, max_action):
super(Actor, self).__init__()
self.layer_1 = nn.Linear(state_dim, 800)
self.layer_2 = nn.Linear(800, 600)
self.layer_3 = nn.Linear(600, action_dim)
self.max_action = max_action
self.soft = nn.Softsign()
self.tanh = nn.Tanh()
self.sig = nn.Sigmoid()
def forward(self, s):
s = F.relu(self.layer_1(s))
s = F.relu(self.layer_2(s))
a = self.tanh(self.layer_3(s))
return a
class Critic(nn.Module):
def __init__(self, state_dim, action_dim):
super(Critic, self).__init__()
self.layer_1 = nn.Linear(state_dim , 800)
self.layer_2 = nn.Linear(800 + action_dim, 600)
#self.layer_2_a = nn.Linear(action_dim, 600)
self.layer_3 = nn.Linear(600, 1)
#self.layer_4 = nn.Linear(state_dim + action_dim, 800)
#self.layer_5 = nn.Linear(800, 600)
#self.layer_5_a = nn.Linear(action_dim, 600)
#self.layer_6 = nn.Linear(600, 1)
def forward(self, s, a):
#sa = torch.cat([s, a], 1)
q = F.relu(self.layer_1(s))
q = F.relu(self.layer_2(torch.cat([q, a], 1)))
q = self.layer_3(q)
# s1 = F.relu(self.layer_1(s))
# self.layer_2_s(s1)
# self.layer_2_a(a)
# s11 = torch.mm(s1, self.layer_2_s.weight.data.t())
# s12 = torch.mm(a, self.layer_2_a.weight.data.t())
# s1 = F.relu(s11 + s12 + self.layer_2_a.bias.data)
#q1 = self.layer_3(s1)
# s2 = F.relu(self.layer_4(s))
# self.layer_5_s(s2)
# self.layer_5_a(a)
# s21 = torch.mm(s2, self.layer_5_s.weight.data.t())
# s22 = torch.mm(a, self.layer_5_a.weight.data.t())
# s2 = F.relu(s21 + s22 + self.layer_5_a.bias.data)
# q2 = self.layer_6(s2)
# q2 = F.relu(self.layer_4(sa))
# q2 = F.relu(self.layer_5(q2))
# q2 = self.layer_6(q2)
return q
class DDPG(object):
def __init__(self, state_dim, action_dim, max_action):
# Initialize the Actor network
self.actor = Actor(state_dim, action_dim, max_action).to(device)
self.actor_target = Actor(state_dim, action_dim, max_action).to(device)
self.actor_target.load_state_dict(self.actor.state_dict())
self.actor_optimizer = torch.optim.Adam(self.actor.parameters())
# Initialize the Critic networks
self.critic = Critic(state_dim, action_dim).to(device)
self.critic_target = Critic(state_dim, action_dim).to(device)
self.critic_target.load_state_dict(self.critic.state_dict())
self.critic_optimizer = torch.optim.Adam(self.critic.parameters())
self.max_action = max_action
# Function to get the action from the actor
def get_action(self, state):
state = torch.Tensor(state.reshape(1, -1)).to(device)
return self.actor(state).cpu().data.numpy().flatten()
# training cycle
def train(self, replay_buffer, iterations, batch_size=100, discount=1, tau=0.005, policy_noise=0.2, # discount=0.99
noise_clip=0.5, policy_freq=2):
for it in range(iterations):
# print(it)
# sample a batch from the replay buffer
batch_states, batch_actions, batch_rewards, batch_dones, batch_next_states = replay_buffer.sample_batch(
batch_size)
state = torch.Tensor(batch_states).to(device)
next_state = torch.Tensor(batch_next_states).to(device)
action = torch.Tensor(batch_actions).to(device)
reward = torch.Tensor(batch_rewards).to(device)
done = torch.Tensor(batch_dones).to(device)
next_action = self.actor_target(next_state)
# Calculate the Q values from the critic-target network for the next state-action pair
target_Q = self.critic_target(next_state, next_action)
# Calculate the final Q value from the target network parameters by using Bellman equation
target_Q = reward + ((1 - done) * discount * target_Q).detach()
# Get the Q values of the basis networks with the current parameters
current_Q = self.critic(state, action)
# Calculate the loss between the current Q value and the target Q value
loss = F.mse_loss(current_Q, target_Q)
# loss = torch.sum(abs((target_Q-current_Q1)/target_Q) + abs((target_Q-current_Q2)/target_Q))
# Perform the gradient descent
self.critic_optimizer.zero_grad()
loss.backward()
self.critic_optimizer.step()
actor_grad = self.critic(state, self.actor(state))
actor_grad = -actor_grad.mean()
self.actor_optimizer.zero_grad()
actor_grad.backward()
self.actor_optimizer.step()
# Use soft update to update the actor-target network parameters by infusing small amount of current parameters
for param, target_param in zip(self.actor.parameters(), self.actor_target.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
# Use soft update to update the critic-target network parameters by infusing small amount of current parameters
for param, target_param in zip(self.critic.parameters(), self.critic_target.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
class Actor(nn.Module):
def __init__(self, state_dim, action_dim, max_action):
super(Actor, self).__init__()
self.layer_1 = nn.Linear(state_dim, 800)
self.layer_2 = nn.Linear(800, 600)
self.layer_3 = nn.Linear(600, action_dim)
self.max_action = max_action
self.soft = nn.Softsign()
self.tanh = nn.Tanh()
self.sig = nn.Sigmoid()
def forward(self, s):
s = F.relu(self.layer_1(s))
s = F.relu(self.layer_2(s))
a = self.tanh(self.layer_3(s))
return a
class Critic(nn.Module):
def __init__(self, state_dim, action_dim):
super(Critic, self).__init__()
self.layer_1 = nn.Linear(state_dim + action_dim, 800)
self.layer_2 = nn.Linear(800 , 600)
#self.layer_2_a = nn.Linear(action_dim, 600)
self.layer_3 = nn.Linear(600, 1)
#self.layer_4 = nn.Linear(state_dim + action_dim, 800)
#self.layer_5 = nn.Linear(800, 600)
#self.layer_5_a = nn.Linear(action_dim, 600)
#self.layer_6 = nn.Linear(600, 1)
def forward(self, s, a):
#sa = torch.cat([s, a], 1)
q = F.relu(self.layer_1(torch.cat([s, a], 1)))
q = F.relu(self.layer_2(q))
q = self.layer_3(q)
# s1 = F.relu(self.layer_1(s))
# self.layer_2_s(s1)
# self.layer_2_a(a)
# s11 = torch.mm(s1, self.layer_2_s.weight.data.t())
# s12 = torch.mm(a, self.layer_2_a.weight.data.t())
# s1 = F.relu(s11 + s12 + self.layer_2_a.bias.data)
#q1 = self.layer_3(s1)
# s2 = F.relu(self.layer_4(s))
# self.layer_5_s(s2)
# self.layer_5_a(a)
# s21 = torch.mm(s2, self.layer_5_s.weight.data.t())
# s22 = torch.mm(a, self.layer_5_a.weight.data.t())
# s2 = F.relu(s21 + s22 + self.layer_5_a.bias.data)
# q2 = self.layer_6(s2)
# q2 = F.relu(self.layer_4(sa))
# q2 = F.relu(self.layer_5(q2))
# q2 = self.layer_6(q2)
return q
class DDPG(object):
def __init__(self, state_dim, action_dim, max_action):
# Initialize the Actor network
self.actor = Actor(state_dim, action_dim, max_action).to(device)
self.actor_target = Actor(state_dim, action_dim, max_action).to(device)
self.actor_target.load_state_dict(self.actor.state_dict())
self.actor_optimizer = torch.optim.Adam(self.actor.parameters())
# Initialize the Critic networks
self.critic = Critic(state_dim, action_dim).to(device)
self.critic_target = Critic(state_dim, action_dim).to(device)
self.critic_target.load_state_dict(self.critic.state_dict())
self.critic_optimizer = torch.optim.Adam(self.critic.parameters())
self.max_action = max_action
# Function to get the action from the actor
def get_action(self, state):
state = torch.Tensor(state.reshape(1, -1)).to(device)
return self.actor(state).cpu().data.numpy().flatten()
# training cycle
def train(self, replay_buffer, iterations, batch_size=100, discount=1, tau=0.005, policy_noise=0.2, # discount=0.99
noise_clip=0.5, policy_freq=2):
for it in range(iterations):
# print(it)
# sample a batch from the replay buffer
batch_states, batch_actions, batch_rewards, batch_dones, batch_next_states = replay_buffer.sample_batch(
batch_size)
state = torch.Tensor(batch_states).to(device)
next_state = torch.Tensor(batch_next_states).to(device)
action = torch.Tensor(batch_actions).to(device)
reward = torch.Tensor(batch_rewards).to(device)
done = torch.Tensor(batch_dones).to(device)
next_action = self.actor_target(next_state)
# Calculate the Q values from the critic-target network for the next state-action pair
target_Q = self.critic_target(next_state, next_action)
# Calculate the final Q value from the target network parameters by using Bellman equation
target_Q = reward + ((1 - done) * discount * target_Q).detach()
# Get the Q values of the basis networks with the current parameters
current_Q = self.critic(state, action)
# Calculate the loss between the current Q value and the target Q value
loss = F.mse_loss(current_Q, target_Q)
# loss = torch.sum(abs((target_Q-current_Q1)/target_Q) + abs((target_Q-current_Q2)/target_Q))
# Perform the gradient descent
self.critic_optimizer.zero_grad()
loss.backward()
self.critic_optimizer.step()
actor_grad = self.critic(state, self.actor(state))
actor_grad = -actor_grad.mean()
self.actor_optimizer.zero_grad()
actor_grad.backward()
self.actor_optimizer.step()
# Use soft update to update the actor-target network parameters by infusing small amount of current parameters
for param, target_param in zip(self.actor.parameters(), self.actor_target.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
# Use soft update to update the critic-target network parameters by infusing small amount of current parameters
for param, target_param in zip(self.critic.parameters(), self.critic_target.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
class Actor(nn.Module):
def __init__(self, state_dim, action_dim, max_action):
super(Actor, self).__init__()
self.layer_1 = nn.Linear(state_dim, 800)
self.layer_2 = nn.Linear(800, 600)
self.layer_3 = nn.Linear(600, action_dim)
self.max_action = max_action
self.soft = nn.Softsign()
self.tanh = nn.Tanh()
self.sig = nn.Sigmoid()
def forward(self, s):
s = F.relu(self.layer_1(s))
s = F.relu(self.layer_2(s))
a = self.tanh(self.layer_3(s))
return a
class Critic(nn.Module):
def __init__(self, state_dim, action_dim):
super(Critic, self).__init__()
self.layer_1 = nn.Linear(state_dim, 800)
self.layer_2_s = nn.Linear(800, 600)
self.layer_2_a = nn.Linear(action_dim, 600)
self.layer_3 = nn.Linear(600, 1)
# self.layer_4 = nn.Linear(state_dim, 800)
# self.layer_5_s = nn.Linear(800, 600)
# self.layer_5_a = nn.Linear(action_dim, 600)
# self.layer_6 = nn.Linear(600, 1)
def forward(self, s, a):
s1 = F.relu(self.layer_1(s))
self.layer_2_s(s1)
self.layer_2_a(a)
s11 = torch.mm(s1, self.layer_2_s.weight.data.t())
s12 = torch.mm(a, self.layer_2_a.weight.data.t())
s1 = F.relu(s11 + s12 + self.layer_2_a.bias.data)
q1 = self.layer_3(s1)
# s2 = F.relu(self.layer_4(s))
# self.layer_5_s(s2)
# self.layer_5_a(a)
# s21 = torch.mm(s2, self.layer_5_s.weight.data.t())
# s22 = torch.mm(a, self.layer_5_a.weight.data.t())
# s2 = F.relu(s21 + s22 + self.layer_5_a.bias.data)
# q2 = self.layer_6(s2)
return q1
class DDPG_2(object):
def __init__(self, state_dim, action_dim, max_action):
# Initialize the Actor network
self.actor = Actor(state_dim, action_dim, max_action).to(device)
self.actor_target = Actor(state_dim, action_dim, max_action).to(device)
self.actor_target.load_state_dict(self.actor.state_dict())
self.actor_optimizer = torch.optim.Adam(self.actor.parameters())
# Initialize the Critic networks
self.critic = Critic(state_dim, action_dim).to(device)
self.critic_target = Critic(state_dim, action_dim).to(device)
self.critic_target.load_state_dict(self.critic.state_dict())
self.critic_optimizer = torch.optim.Adam(self.critic.parameters())
self.max_action = max_action
# Function to get the action from the actor
def get_action(self, state):
state = torch.Tensor(state.reshape(1, -1)).to(device)
return self.actor(state).cpu().data.numpy().flatten()
# training cycle
def train(self, replay_buffer, iterations, batch_size=100, discount=1, tau=0.005, policy_noise=0.2, # discount=0.99
noise_clip=0.5, policy_freq=2):
for it in range(iterations):
# print(it)
# sample a batch from the replay buffer
batch_states, batch_actions, batch_rewards, batch_dones, batch_next_states = replay_buffer.sample_batch(
batch_size)
state = torch.Tensor(batch_states).to(device)
next_state = torch.Tensor(batch_next_states).to(device)
action = torch.Tensor(batch_actions).to(device)
reward = torch.Tensor(batch_rewards).to(device)
done = torch.Tensor(batch_dones).to(device)
# Obtain the estimated action from the next state by using the actor-target
next_action = self.actor_target(next_state)
next_action = self.actor_target(next_state)
# Calculate the Q values from the critic-target network for the next state-action pair
target_Q = self.critic_target(next_state, next_action)
# Calculate the final Q value from the target network parameters by using Bellman equation
target_Q = reward + ((1 - done) * discount * target_Q).detach()
# Get the Q values of the basis networks with the current parameters
current_Q = self.critic(state, action)
# Calculate the loss between the current Q value and the target Q value
loss = F.mse_loss(current_Q, target_Q)
# loss = torch.sum(abs((target_Q-current_Q1)/target_Q) + abs((target_Q-current_Q2)/target_Q))
# Perform the gradient descent
self.critic_optimizer.zero_grad()
loss.backward()
self.critic_optimizer.step()
actor_grad = self.critic(state, self.actor(state))
actor_grad = -actor_grad.mean()
self.actor_optimizer.zero_grad()
actor_grad.backward()
self.actor_optimizer.step()
# Use soft update to update the actor-target network parameters by infusing small amount of current parameters
for param, target_param in zip(self.actor.parameters(), self.actor_target.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
# Use soft update to update the critic-target network parameters by infusing small amount of current parameters
for param, target_param in zip(self.critic.parameters(), self.critic_target.parameters()):
target_param.data.copy_(tau * param.data + (1 - tau) * target_param.data)
These all seem like decent enough implementations for DDPG. I don't think that changing the place where action is introduced in critic will be detrimental for training here though, so that alone will probably not bring a lot of benefit.
DDPG is very fickle and unstable (which is why TD3 was developed) so it might be that you need to just run the training multiple times. Have you tried using bunch of different seed values for your training?
At present, I just changed the network, and did not modify the value of the random seed. Is modifying this value helpful to training?
The initial weights of the network are initialized randomly, based on your seed value. There might be occasions where the weights are settled in such a way that your step size can not escape some sort of local optimum and renders the training useless. Setting a different seed value will initialize the network weights differently, if you are lucky, in a way that can be conductive to training. This is especially important for unstable network structures.
Set seed values are used for reproducibility. But when developing a network, finding the right seed value can be just as crucial as finding the right architecture.
So in short: try bunch of different seed values. That might help.
Thanks for your answer, I will try different random seeds later
Hello, may I ask what is the reason for this error? Although it does not affect the continued operation of the car, it will always be printed on the command line.
Currently I'm training in gazebo9 and ros melodic
This is a threading issue where a call is made to a None object. You can try to change the reading of the scan and odom topics from the wait_for_message
type to a subscriber type as was done here: https://github.com/reiniscimurs/DRL-robot-navigation/issues/14#issuecomment-1107967435
Thanks, this solution seems to work so far
Hello, I want to use the data of the 16-line lidar velodyne in practice. How to change part of the code of the velodyne in your project so that it only uses the 180-degree data in the front?
The simulated device in Gazebo already uses only the 180 degrees in the front. Do you mean a physical velodyne device on a real robot? Physical implementations are not part of this repo and I can't really help you there because I do not know what kind of hardware you want to use, but you can always just record the 360 degree data and filter out the 180 degrees in front of the robot.
I want to use the same 16-line velodyne lidar as the simulation, because I see that you only turn on the laser of the first 180 degrees in the simulation, and actually do not know how to read the laser data of the first 180 degrees. Another problem is that in your code, data = rospy.wait_for_message('/r1/front_laser/scan', LaserScan, timeout=0.5) acquires laser data and converts point cloud data with this function def velodyne_callback(self, v) What is the difference between the meaning of the value of the distance?
The data = rospy.wait_for_message('/r1/front_laser/scan', LaserScan, timeout=0.5)
is not related to velodyne data. It is used to read a lidar data to detect a collision, after that, velodyne data is read through the velodyne callback velodyne_callback(self, v)
. Then the velodyne data is squashed into 2D, binned and used as a state information.
What is the difference between the meaning of the value of the distance?
I am not quite understanding this question. If you mean what is the value that the sensors return, then for lidar it is distance and for velodyne it is pointcloud coordinate data. In order to obtain the distance information from velodyne data, a euclidean distance is calculated to the coordinate. Then, an angle is calculated at which this distance is obtained. These functions are carried out in the velodyne callback.
If you want to use a real velodyne device, you should just connect it to your robot computer and publish its data on its own publisher. I do not remember if you could set the fov of velodyne. If yes, then you can just set it. If not, then just read all the pointcloud data from velodyne, and filter out the angles that you do not want in the state information. In this case you would just need to slightly modify the velodyne callback.
Do you mean that data = rospy.wait_for_message('/r1/front_laser/scan', LaserScan, timeout=0.5) in this code, did you use other radars to obtain laser data in the simulation, but in my impression the simulation You seem to be using only the velodyne radar. Another question is that when I actually use the velodyne radar, there is a /scan topic that can be used to detect collisions. Do you know what this topic has to do with velodyne's 2D value compressed by the callback function velodyne_callback(self, v) difference?
If you will look at the rqt graph while running the simulation, you will see the attached devices. Besides the velodyne 3D lidar, there is also the 2D rpLidar attached to the robot in simulation. The /scan topic you mention is for the 2D rpLidar, this is used only to detect collisions. It is done so because it is a very simple way to detect them. Other methods include using bumpers, but those are notoriously difficult to implement in ROS and using velodyne data directly, but at that point you need to make sure you do not make mistakes in filtering out the floor data and so on. That is why the RpLidar is used for collisions as it is a more robust way to detect them.
As for the laser information that is later given to the network, here only the velodyne data is used from the velodyne callback function.
TL;DR: /scan topic is for a separate lidar that only detects collisions, it has nothing to do with velodyne callback and data
Closing due to inactivity