reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
635 stars 126 forks source link

Question about velocity adjustment #89

Open lee666-lee opened 1 year ago

lee666-lee commented 1 year ago

Dear Reinis Cimurs, I recently read your essay titled "Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning", I do appreciate your work in robot path planning using the powerful DRL technology, I believe it would be a valuable resource and guidance for many others' research about DRL.

I've reproduced your work through your github code, everything works just fine until I change the output velocity range to [-0.34,0.34] (in your work linear velocity ranges [0,+1], angular velocity ranges [-1,+1]), which leads to a divergence in loss as shown below. image image

To solve the loss problem, I also try to adjust the reward function as below, thankfully it finally converges like this. image image

However, the loss may seem okay but the actual simulation result is not as good as before, the robot may collide 4 or 5 episodes out of 10 episodes, and once the goal is at back of the robot, it seems that the robot dose not know to turn around to navigate to the goal and just go straight to hit the obs bofore it...

  1. If it is not too much trouble, could you please tell me what leads to the decreased accuracy in reaching goal and how to improve it?
  2. Also I am not sure my adjustment to the code is correct or not(to restrict the velocity to [-0.34,+0.34] to meet my robot needs)?

Besides, I also try to brutely mutiply a coefficient and tanh() to adjust the output velocity of Actor, and it fails like this: image 企业微信截图_17012222173465 企业微信截图_17012212652481

  1. Why the simple coefficient can cause so much difference? even the roll-over occurs...I just cannot figure it out...

Thank you for your time and consideration. I really do need your help, I've been stuck here for a week long, it really drives me crazy(sad...)

reiniscimurs commented 1 year ago

Hi,

As far as I can see, I see no issue in simply multiplying the a_In by 0.34 to obtain ranges of [0, 0.34] and [-0.34, 0.34]. I would not touch tanh function though as if you do not change the range function afterwards, it will leado to some weird range. I.E. if you multiply tanh by 0.25 but still have a_In = [(action[0]+1)/2, action[1]] you will get a weird range of [(-0.25+1)/2, (0.25+1)/2].

Did you try changing seed values for your training?

It might just be that with reduced top speed, the samples for training are just not varied enough and it does not arrive at the goal point often or quickly enough and your Bellman equation might need a larger discount factor. You can also try increasing the Bellman discount factor so it would be meaningful over more samples. Alternatively you can try increasing the time delta how long each step is propagated. This should have a similar effect.

lee666-lee commented 1 year ago

Thanks for your timely reply <3 !

  1. Yeah, I've simultaneously changed the range function afterwards like this to assure linear vel ranges in [0,0.25] and angular vel ranges in [-0.25,0.25]: image image Unfortunately it still gets a bad result as you can see in my last post.

  2. Emmmm...In fact I haven't tried to change seed values for training with speed range change, thanks for your advice, I will try that immediately haha.

  3. Actually the bellman discount factor in my last post training is already 0.9999, I am not sure if it is still not big enough? or maybe I should try to set it as 1?

  4. The TIME_DELTA variable I use is 0.1, sorry but Idk what's the relation between TIME_DELTA and the model performance? (Could you please explain it to me if it's convenient for you, I might know too little about the RL theory so sad...) And I will try to set a bigger value to see what happens hhhhhh image

@reiniscimurs

reiniscimurs commented 1 year ago
  1. That is one thing to always try. Convergence often depends on how good are the initial weights of the model. Since initialization is random, it will depend on the seed value.
  2. Should not set to 1 as that means the final value will be propagated back to every state in the sequence. This is not something you'd want. You could try adding some more 9 though, but I would suspect this value could be large enough.
  3. Think of it this way. Each step has a propagation time of 0.1 seconds. If your max linear and angular velocities are 0.34, then you would could only move 3.4cm forward and 0.034 radians in any direction with any time step. So even if you take the maximum action, your next state will be almost the same as the previous one. In most cases, since actions would not be maximal, the state will be almost entirely identical due to bagging of laser state. Since the action is evaluated by the next reached state, the model has difficulty learning since no matter what action it takes, the next state is the same. So increasing the time delta could help differentiate between the states better as you could get something like 10 cm motion per action and that could be enough to actually change the state.

I did a quick test run with limiting the actions in 0.34 range and increasing the reward and the model did not work for me either. Then I additionally increased time delta to 0.3 and the model seems to work better. Though, keep in mind that I just did some short training runs and did not fully train the network. But that is something you could try as well. Training on low speeds will always make the training slow though.

reiniscimurs commented 1 year ago

I have run a full training cycle. It trains without an issue for me with increasing time delta to 0.3 seconds. image

lee666-lee commented 1 year ago

Geniuely appreciate your reply and help(really happy :D)!

I've tried setting different seed value and as well the TIME_DELTA as 0.3 to train the model the moment I finished reading your email. I am training two models with different seed values on two PC currently, I'm just so excited to see what result they will get tomorrow, wish good result I will see there hahahaaaa.

You're so cool! It's my pleasure to have the chance of talking to you LOL. @reiniscimurs

lee666-lee commented 12 months ago

WOW moment! It works just fine haha, many thanks for your kind help.

I've trained the model for nearly 3 days so far, and the metrics currently show like this: image image image

I was wondering why the loss result keeps stably climbing up though I still get many successful path-planning on reaching goal? (it's just so different from what I know about DL)

I think the simulation on PC is about okay now, so I'm planning to deploy it on a real robot to see the physical effect, looking forward to seeing how it delivers next. @reiniscimurs

reiniscimurs commented 12 months ago

Hi,

Good to hear that it is working for you as well.

Regarding the "loss" function, it is not the same type of loss as you would encounter in BC or IL tasks where you have a specific dataset that you try to fit your model to and the difference is the loss. Here, calling it "loss" might actually be a bit misleading. See explanation in section 2. Making the loss function in the "You should know" part: https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html

lee666-lee commented 12 months ago

Many thanks :P

I've sent an email to your email adress reinis.cimurs@de.bosch.com, do you receive it? pls do check it, sincerely looking forward to your reply. @reiniscimurs

reiniscimurs commented 12 months ago

Hi,

I have sent a reply yesterday.

lee666-lee commented 11 months ago

Have you ever tried using 360 degrees laser data(currently it's 180 degrees laser data and env_dim=20, which means 9 degrees a gap)?

I changed it to 360 degrees range and set env_dim = 40, reserving 9 degrees a gap as before. However, the model keeps going around in circles or just moving little each step, I thought maybe it's becuase of the seed value at first, so I also tried dozens of seed values(from 0 to 10), but it seems no help at all.

1.I was wondering if the doubled env_dim makes the network hard to reference(in other words, the network input becomes larger, but the network itself has such shallow and basic layer components that it cannot deal with the larger input)?

2.Or should I just try more seed values(but how many numbers do I exactly have to try? and how could you be sure of this problem is seed value related or not?)

Looking forward to your reply, thank you for your time and consideration :)

@reiniscimurs

reiniscimurs commented 11 months ago

Hi,

Did you also update the gap creation function here: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/velodyne_env.py#L91C18-L91C18

You can see that it is later used in the velodyne callback for creating gaps: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/velodyne_env.py#L135

There might be some mismatch here.

Doubled env_dim should not have too much effect for training in general. I have trained models with 40 laser values as well as lower number of parameters in MLP layers and it was successful. There might be some other parameters, like learning rate, that might need an adjustment, but it should still work. It does blow up the size of the saved state representations though.

lee666-lee commented 11 months ago

1.Yep, I have update the gap to 40 intervals as well, reserving 9 degrees a gap, which means from -pi to pi there is 360 degrees. What's different is that I use 2d laser hokuyo instead of 3d laser velodyne to train, here is my implement. 企业微信截图_1702603895676 image image

2.Would you mind telling me where is the learning rate parameter? why I didn't find it...thx a lot :)

3.Do you think by increasing the FOV of laser from 180deg to 360deg or 270deg would help the robot nav to the goal faster and easier? cuz when I deployed the trained-model(using 180deg FOV) to a real robot, when meeting scene like this, it seems has quite uncertainties to reach goal from point A, I guess it is the orientation when at point A that makes a difference. image

fig1: if the robot tilts toward the obs, it turns out to turn back and navs to B. fig2: if the robot tilts away from the obs, it just navs directly to goal without turing back.

so I just think, maybe the broaden FOV can avoid fig 1 situation from happening as much as possible...what's your opinion about this?

Looking forward to your reply, thank you for your time and consideration 👍

@reiniscimurs

reiniscimurs commented 11 months ago
  1. Looking at it quickly it seems like a fine implementation. Generally, there should not be much difference with going from 3D to 2D laser.

  2. Yes, sorry, learning rate is not exposed as a hyperparameter in this repo. You would set it when creating the Adam optimizer and pass it in as a parameter: https://pytorch.org/docs/stable/generated/torch.optim.Adam.html

  3. I think FOV can help in some sense as you would get more descriptive information about the environment and it is reasonable to try it. Probably adding history through some GRU or LSTM would also bring benefit. However, what I think is happening in this case is that the policy has reached some local optimum and it does not have an optimal way of getting out of it. This is a reason why there is a random_near_obstacle bootstrapping method, to create more exploration near obstacles for the policy to learn better solutions in the long term. Maybe you need additional bootstrapping methods. Another thing to try is to use cos and sin of angle instead of angle itself in the polar coordinates. Cos and sin do not have discontinuities or "flips" and it might be more descriptive when dealing with states where the angle might be an issue (you can see it described here in eq 3 and 4: https://www.mdpi.com/1999-4893/16/9/412)

lee666-lee commented 11 months ago

Copy that hhhhhh

I will try to set different LR to see the difference, as for the RNN I shall do some research on how to add them properly(guess the design of NN is not that easy, Idk...just the DNN always seems like a blackbox to me, like adding the RNN here or there, the effect is hard to foresee and lack of good interpretability)

Bootstrapping methods and angle representation are somewhat fresh to me haha, this is learn and live LOL

Many thanks for sharing with me those good ideas(idea is the very precious thing in this world, it is just so hard to come up with one, so heartily thx :p), cannot wait to discover new possibilities and see how they works.

@reiniscimurs

reiniscimurs commented 11 months ago

There is an implementation of GRU based on this work that uses history in navigation which might be a good start looking into that direction: https://github.com/Barry2333/DRL_Navigation

lee666-lee commented 11 months ago

COOL haha! Thx a lot, it shall be a good inspiration :)

@reiniscimurs

lee666-lee commented 10 months ago

Dear Reinis Cimurs,

During recent model training trials, I encountered somewhat strange phenomena(as the videos shown below. 1st video shows the model begins to swing left and right with little move forward each step, until it exceeds the maxsteps_per_episode, so the episode is done without reaching goal. 2nd video shows the model seems always tend to take a detour to reach the goal while the goal is clearly in front of it without any obs).

https://github.com/reiniscimurs/DRL-robot-navigation/assets/57956361/facad992-3cd1-46d1-b5e5-711ad42d79e3

https://github.com/reiniscimurs/DRL-robot-navigation/assets/57956361/17d974b6-5b9b-4a72-9eb3-c3fe9e3def37

I just cannot tell what leads to them, cuz the model generally can reach the goal most times(like 7 episodes/10 episodes), and the goal point in the videos are apparently not that hard to reach as well. Also, the paths it plans are not intelligent enough compared to the source code model.

And what I've changed are: a.2D hokuyo laser with 360 FOV b.Velocity image c.reward function image d.TIME_DELTA=0.3 e.Adam lr=1e-4(when I set 1e-3, the model always goes in circles, I think it just traps in local optimum without any meaningful learning, so I set a smaller lr and it works)

What's your opinions about it? Sincerely need your guidance. Thank you for your time and consideration 👍 @reiniscimurs

reiniscimurs commented 10 months ago

How many epochs have elapsed in this training? In any case the behavior makes sense and is something that i have observed in earlier stages of training. With moving left and right the issue is that there is no history for states. So you end up in local optimum where at one step the best calculated action is to move to the right, and the next one it is to move to left. So you end up in a cycle. Since you do not know that you are in a cycle (lack of history), the policy keeps repeating itself. You can either wait for the training to run longer and hope it learns from this, write a new bootstrapping method that detects a cycle and forces the robot to move or you could increase the random noise value that is added to the action during the training stage. The random noise could then "dislodge" the robot from local optimum and it might learn better performance.

For the second case, it is again the local optimum. Or better explanation would be that the evaluation of the state is such that it "fears" collision more than reaching the goal. At this stage in training the robot evaluates that there is more danger of colliding with the obstacle as that is most likely something that has happened in previous runs where robot collided with an obstacle on the right side. So closeness to obstacle on the right has a negative value. But it is not the same on the left side as the state is different and the evaluation of it will be different as well. As you can see the effect is around the 1m mark from the obstacle which would be consistant also with r3 calculation (unless you have changed it). Since the robot is somewhat slower in your case, you can consider lowering the reward for r3. In any case, finding the right reward function and the right coefficients may solve these issues.

Also, theoretically, the right value of the state should be calculated after seeing each state many times so all you might need is to just train the model longer. You also should not judge the performance too much from training episodes, but rather the evaluation episodes as no noise or bootstrapping is applied there.

lee666-lee commented 10 months ago

Thx for your kind help :P

In this training the model has been trained for ~2500 episodes, so I think there might be little improvement by just training it longer if without other adjustment.

Heartily appreciate your telling me the reasons behind these behaviours so patiently, I shall set about trying those possible solutions to see how they work LOL.

BTW, is the random noise here refers to expl_noise(cuz there're expl_noise and policy_noise, which I think the expl_noise is used to make the robot do the actual actions, while the policy_noise is only for loss calculations)? @reiniscimurs

reiniscimurs commented 10 months ago

Hi,

Yes I mean the expl_noise as that is the value that is added to the calculated action and executed in the simulation. In this repo, this value reduces over time so you might also want experiment with it reducing slower to have more of an effect. (https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/train_velodyne_td3.py#L227C6-L227C6)

lee666-lee commented 10 months ago

Dear Reinis Cimurs,

Heartily appreciate your help, your suggestions have inspire me to try some possible ways. And these days I have tried some different methods, belows are my exps:

  1. I've tried increasing the expl_noise from 0.34 to 1 to 1.5, setting expl_min as 0.1, however, the "swing left and right" and "detour" phenomena seem gain no improvement, which makes me so confused... cuz my model's velocity falls in [-0.34,+0.34], I think these noise are big enough to make the model jump out of the local optimum?

  2. As for the bootstrapping method, I try adding the "swing detection code" (as shown below) during training, it does have some effect, every time it detects the "swing left and right" phenomenon, it will generate a random velocity to force the model to make a move, which will possibly change the state, so can the model jump out of the local optimum? 企业微信截图_17060687274516 企业微信截图_17060687479712 企业微信截图_17060687757330 企业微信截图_17060687937785 企业微信截图_17060688245015

  3. And I also try changing the reward function (as shown below), considering it "fears" collision more than reaching the goal, I try rewarding the model once it gets closer to the goal, vice versa, punishing once it gets away from the goal, at fisrt 1000 episodes it seems okay, while during later traing the model always colliding like a headless fly, like learning worse and worse... I dunno why this will happen... 企业微信截图_17060691982499

Idk whether I miss something or these methods are not proper enough? What's your opinions about them, sincerely need your help 👍 @reiniscimurs

reiniscimurs commented 10 months ago

Hi

  1. Setting the noise larger than 0.34 probably will not have much effect since max velocity is capped. If you cap the velocity at 0.34, it does not matter if you add 0.34 or 1 to value 0. It is still going to execute with 0.34 velocity. Another thing is the inherent function of the noise that its average value over infinite samples will be 0. So even if you execute a single step to the left, it is highly likely that the next step will force the robot move to the right. Since 0.34 velocity is probably too small to escape the local optimum, it just does not learn. It would need a sequence of constant actions to escape this. One option for that would e to introduce bias in the noise (set mean to anything but 0) but that might bias the general robot motion as well.
  2. I think this is the way to go, but is the action executed just once? As mentioned, you most likely need a sequence of consistent actions to escape the local minimum. See implementation here: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/train_velodyne_td3.py#L330-L342 Here we select a random action and consistently perform it over sequential N number of steps. I would suggest implementing similar logic here as well.
  3. While I have seen people try to implement the distance decrease to goal reward functions, I feel like it is a counterintuitive measure as it creates pockets of local optimum and works only in simple environments. In your case it seems that the closing distance measure gives more reward than colliding gives a negative reward. Consider motion with 0.34 speed directly towards goal. That is 1.34 positive reward at each step. But if you do not close the gap, whatever you do, it will always be at least -2+0.34=-1.66 negative reward for each step. So your value function here is ill-formed. There is no other choice for the robot than to aim directly at the goal and drive straight there and avoidance has lesser benefit that short term gain of getting that 1.34 reward. What you are experiencing here is the major issue with DRL and that is how to properly form a reward function. Personally, i would never use distance closure reward (you can read more about that here: https://github.com/reiniscimurs/DRL-robot-navigation/issues/2#issuecomment-962382003), but if you do, you should consider that distance closure is dependant on how much distance you actually covered. Moving 1cm closer to the goal or 1meter closer to the goal actually returns the same value for you, and that does not seem correct.
lee666-lee commented 10 months ago

Dear Reinis Cimurs,

So glad to have your kind help.

1.Yep, for the "swing detection code", the random action will just be executed once each time it detects "swing left and right" phenomenon.

Since 0.34 velocity is probably too small to escape the local optimum, it just does not learn. It would need a sequence of constant actions to escape this.Just want to make sure of it, here do you mean cuz the 0.34 velocity is too small to make the robot see different state, thus the robot cannot output meaningful action from state?

Thanks for your suggestions, I have changed this part code like this: image

So far the training has reached ~100 episodes, but I cannot tell if the swing phenomenon can be eliminated now...I will keep training it, hope this will work.

2.I didn't expect this will happen "There is no other choice for the robot than to aim directly at the goal and drive straight there and avoidance has lesser benefit that short term gain of getting that 1.34 reward." that is really some deep logic thinking :D

The reason why I try to change the reward function like that is that my robot seems so blind to nav to the goal(not even detours but also like completely ignores the goal), just as the pic shows below. image

Now I try changing the reward fucntion like this: 企业微信截图_17066218995088

So far I gradually find the DRL is quite unique, cuz one has to speculates what on earth leads the robot acts like this or that based on different phenomena, and how to adjust the code to make it acts the way one expects it learns to. So magic but also challenging hhh

Looking forward to your reply, Sincerely appreciate your guidance <3 @reiniscimurs

reiniscimurs commented 10 months ago
  1. Think of it this way. If you have an obstacle right in front of you, like a wall for example, you will need full 90 degree turn (lets assume that this is requirement) before moving forward, not to collide it. If your max velocity is 0.34 r/s and each step is 0.3 seconds, how many subsequent steps do you need to make in order to get a 90 degree turn? 1.57 /(0.3 * 0.34) = 15 steps with maximal turning velocity just to find a way to escape the obstacle. So taking just one step with maximal turning velocity will probably still lead to a collision. Which is why you need multiple steps in a row of consistent velocity to provide an example of escaping the collision. This is what I mean by saying 0.34 is just not enough of rotation to escape it, you need multiple steps of 0.34 in a row. From a brief look it seems that the changes you made implements this and it should give some benefit in the long run.
  2. Well this is the main problem of DRL in general - how to properly design the reward function. This is a huge issue and very difficult to figure out in complex situations because it is difficult for us as humans to imagine what pitfalls there might be when posing an ill-formed reward. The issue is that the policy will not do what we want, but rather optimize for the easiest way to gain the system. So it is important to think long and hard about how to design such a function (which is why IRL is sometimes more useful). However, the example you provided is a bit weird and I would expect after some time the robot should actually go directly towards the goal. In the original code this does not happen (at least for me) so it is a bit strange. Re-thinking the reward function might be beneficial but also check that the state actually returns proper values. By that I mean, check if dist and angle to the robot seems reasonable in every step.
lee666-lee commented 10 months ago

Dear Reinis Cimurs,

Thanks for your timely reply.

1.The dist and angle calculations are from the source code, and I haven't changed them yet. Also I print the dist and angle in each step, the values do show the corresponding actions(like once the robot gets away from goal, the dist does become larger). image image

2."However, the example you provided is a bit weird and I would expect after some time the robot should actually go directly towards the goal. In the original code this does not happen (at least for me) so it is a bit strange." This also makes me so confused...since my current code only has a few differences compared with the source code: a.180deg velodyne->360deg hokuyo b.velocity [-1,1] -> [-0.34,0.34] c.time delta 0.1 -> 0.3 d.lr 1e-3 -> 1e-4 e.reward function f.swing detection code added

the c.d.e.f. I made are totally for a.b.

a+b = the model always goes in circles without any learning a+b+c+d = finally starts to learn, but swing and detour phenomena occasionally occur, and the path is not smooth enough as the source code a+b+c+d+e+f <= I'm still working on it, hope the model can act at least as the source code model...

Hard to believe just these tiny differences make so different. Frankly speaking, I'm kind of sad now, but I don't want to give it up. Appreciate your giving me so much support. @reiniscimurs

lee666-lee commented 10 months ago

Dear Reinis Cimurs,

Currently the model has been trained for ~2000 episodes, although https://github.com/reiniscimurs/DRL-robot-navigation/issues/89#issuecomment-1916971711 this version "swing detection code" does help decrease swing phenomenon greatly in training stage, the model still swings left and right a lot in test stage(but not in every episode, this phenomenon occasionally occurs).

Feel like the methods are exhausted but the result is no getting better...

https://github.com/reiniscimurs/DRL-robot-navigation/assets/57956361/08e3b4fd-5147-4830-8ee0-b4ce5cf58447

https://github.com/reiniscimurs/DRL-robot-navigation/assets/57956361/cebdb9c7-306a-462e-8c49-a205e3aef59c

https://github.com/reiniscimurs/DRL-robot-navigation/assets/57956361/6be16fc0-7f64-4d7e-92d3-52b76a69ad54

@reiniscimurs

reiniscimurs commented 9 months ago

Hi,

Do you by any chance have a repo or similar to look at the full code? Sorry, I am also out of ideas and I don't think I can help that much more without it. I will try to get around to train a model with smaller speed on the weekend if I have time to see if I can get a working model.

lee666-lee commented 9 months ago

Dear Reinis Cimurs,

During recent training I suddenly find there's a fault writing in my reward function, and the swing phenomenon just gone after I correct this fault(how careless of myself, also I'm so sorry for misleading you that much)

However, the model still cannot "see" the goal after the above correction, I haven't founded the reason behind it...

I've sent an email to your mailbox, and the attachment is the full code, very very grateful for your kind help, Looking forward to your reply <3 @reiniscimurs

reiniscimurs commented 9 months ago

Hi, I have received the files. I will try to take a look at the program when I have the time.

lee666-lee commented 9 months ago

Dear Reinis Cimurs,

I try to train the model much longer, and its behaviors get a little better, but far inferior to the source code model:

https://github.com/reiniscimurs/DRL-robot-navigation/issues/89#issuecomment-1931936414 (this model has been trained for ~400 episodes)(update here: the swing phenomenon dosen't gone, but only occurs few times during later training, so maybe it has something to do with the reward function but not all. Moreover, with longer training, the detour phenomenon disappears, which means it finally can "see" the goal)

BTW, Do you encounter the same situation with the smaller speed?

Looking forward to your reply, Sincerely appreciate your guidance :) @reiniscimurs

reiniscimurs commented 9 months ago

Hi,

I have trained a couple of times with the code you provided. It is a bit difficult to validate the issues here, since the training on such slow speeds takes quite some time. Is there a specific reason why you would want to use such slow robot speeds?

On the code side, I was able to train to the point where it also encounters the swinging issue. It also did not initially want to go to the goal directly but with longer training it did improve. I simplified the velocity capping and reward function to closer resemble the original implementation as in the original repo. I noticed that mostly you cap velocities to 0.34 but at some instances (in case of vel_flag==1) the velocity shoots up quite high and has a huge non-gradual jump in velocity and as such, huge change in reward function. I will train a couple of more times to see what possible issues there might be and let you know if I find something.

lee666-lee commented 9 months ago

Dear Reinis Cimurs,

Is there a specific reason why you would want to use such slow robot speeds? I'm working on a floor cleaning robot project, that's why I have to cap the velocity in low speed to meet the demands, the project restricts its max_speed < 0.35 m/s

I noticed that mostly you cap velocities to 0.34 but at some instances (in case of vel_flag==1) the velocity shoots up quite high and has a huge non-gradual jump in velocity and as such, huge change in reward function. By which do you mean maybe the problem is that velocity adjustment code is improper?

Highly appreciate for your kind help, your supports giving me much more confidence to crack these barriers. Looking forward to your reply :p

@reiniscimurs

reiniscimurs commented 9 months ago

Hi,

Before going on vacation I also ran some trainings with slower speed and larger FOV of lidar but also could not get a consistently good performance. It would also fall into local minima and "lock up", however it did seem to go to the goal. Unfortunately, I do not think I can invest that much time in training the models and finding a good solution here as I just simply do not have enough time for that. Unfortunately i also do not quite know how to go further here. I would suggest taking iterative steps to getting to where you want and applying only one change at a time. For instance, first train a model with FOV of 360 degrees instead of 180 (which I suspect might be a bit of an issue). Once that works and is confirmed, only then reduce the speed to 0.34. This way we can easier find out what the core problem is.

By which do you mean maybe the problem is that velocity adjustment code is improper? Not that it's improper, rather it is inconsistent. In most cases, the robots max speed will be 0.34 but in some seemingly random instances it suddenly is 1.5. This seems strange to me. I would suggest using a simple capping of max velocity of 0.34.

lee666-lee commented 9 months ago

Dear Reinis Cimurs,

Cordially thanks for your help.

Recently I have also tried training some models, and my findings are as belows.

  1. I visualize the velocity adjustment result, and it shows like this: image image as you mentioned before, it is very inconsistent, so I was wondering may this inconsistency be a crucial reason of the swing phenomenon?

thus, I tried to smooth the velocity adjustment, using tanh-like activation function like this to gurantee the consistency(the green curve): image image

  1. I tried to add LSTM and GRU to the network.

    There is an implementation of GRU based on this work that uses history in navigation which might be a good start looking into that direction: https://github.com/Barry2333/DRL_Navigation

  2. let's use some equations here to better illustrate (all of them use 360FOV, 2D laser, lr=1e-4, time_delta=0.3) a. add LSTM and GRU to the network b. velocity adjustment before smoothing c. velocity adjustment after smoothing

a=>all models showed smoother moves

a+b=>I tried training several models with different seed values, and finally I got one model hardly swing(specifically speaking, maybe 2 or 3 times swinging during every 1000 episodes in test stage). So, I created a simple test map to further examine its performance, sadly, it swings a lot then... is this overfitting? why I just change to a simple map, the performance descends so bad https://github.com/reiniscimurs/DRL-robot-navigation/assets/57956361/37e829ec-658e-4309-bb00-16479f9ac7be

a+c=>I tried training several models with different seed values, I do find some of which jump out of the "swing local-minima"(no swing at all). However, they don't swing but they also don't reach the goals(blind again, I even trained to ~8000 episodes)

5.I would suggest taking iterative steps to getting to where you want and applying only one change at a time. thanks for your suggestions, this shall be the solidest way to figure out where the problem is, "More haste, less speed".

I once trained a model: 180FOV + 2Dlaser+ lr1e-3 + [a_in[0]0.34, a_in[1]0.34] + time_delta0.3. It works okay, except moving too slow...

6.For instance, first train a model with FOV of 360 degrees instead of 180 (which I suspect might be a bit of an issue) may I ask you why you think this might be the reason?

  1. This seems strange to me. I would suggest using a simple capping of max velocity of 0.34. I will try this method.

Wish there's suprise ahead 👍 Many thanks, Looking forward to your reply :D @reiniscimurs

reiniscimurs commented 8 months ago

Hi,

  1. Proably not cause for the swing but it might make the learning more difficult for the neural network. Also consider the angular velocity. In vel_flag=1 case you max angular velocity be 1.5 radians per second. Tanh is probably a better approach but we already have tanh activation output from the neural network. So in some sense you end up applying it twice. My gut feeling is that manipulations with the output velocity are probably not necessary beyond the capping of the minmax range

  2. From the video I'd guess the goal is too close to the wall. I don't think this algorithm would end up going to such a place anyway as it is too "risky". As in, it's Q value is too low.

  3. Maybe the model just does not have enough parameters for a wider FOV or the information is somehow ambiguous for the model. I have no real backing to these claims, but that is just my gut instinct. Unfortunately I will not be able to test it out. Does a 360 FOV model without any other changes work for you? That would be an interesting this to first test out.

Best of luck! Sorry I can't provide more insights