Open lee666-lee opened 1 year ago
Hi,
As far as I can see, I see no issue in simply multiplying the a_In
by 0.34 to obtain ranges of [0, 0.34] and [-0.34, 0.34]. I would not touch tanh function though as if you do not change the range function afterwards, it will leado to some weird range. I.E. if you multiply tanh by 0.25 but still have a_In = [(action[0]+1)/2, action[1]]
you will get a weird range of [(-0.25+1)/2, (0.25+1)/2].
Did you try changing seed values for your training?
It might just be that with reduced top speed, the samples for training are just not varied enough and it does not arrive at the goal point often or quickly enough and your Bellman equation might need a larger discount factor. You can also try increasing the Bellman discount factor so it would be meaningful over more samples. Alternatively you can try increasing the time delta how long each step is propagated. This should have a similar effect.
Thanks for your timely reply <3 !
Yeah, I've simultaneously changed the range function afterwards like this to assure linear vel ranges in [0,0.25] and angular vel ranges in [-0.25,0.25]: Unfortunately it still gets a bad result as you can see in my last post.
Emmmm...In fact I haven't tried to change seed values for training with speed range change, thanks for your advice, I will try that immediately haha.
Actually the bellman discount factor in my last post training is already 0.9999, I am not sure if it is still not big enough? or maybe I should try to set it as 1?
The TIME_DELTA variable I use is 0.1, sorry but Idk what's the relation between TIME_DELTA and the model performance? (Could you please explain it to me if it's convenient for you, I might know too little about the RL theory so sad...) And I will try to set a bigger value to see what happens hhhhhh
@reiniscimurs
I did a quick test run with limiting the actions in 0.34 range and increasing the reward and the model did not work for me either. Then I additionally increased time delta to 0.3 and the model seems to work better. Though, keep in mind that I just did some short training runs and did not fully train the network. But that is something you could try as well. Training on low speeds will always make the training slow though.
I have run a full training cycle. It trains without an issue for me with increasing time delta to 0.3 seconds.
Geniuely appreciate your reply and help(really happy :D)!
I've tried setting different seed value and as well the TIME_DELTA as 0.3 to train the model the moment I finished reading your email. I am training two models with different seed values on two PC currently, I'm just so excited to see what result they will get tomorrow, wish good result I will see there hahahaaaa.
You're so cool! It's my pleasure to have the chance of talking to you LOL. @reiniscimurs
WOW moment! It works just fine haha, many thanks for your kind help.
I've trained the model for nearly 3 days so far, and the metrics currently show like this:
I was wondering why the loss result keeps stably climbing up though I still get many successful path-planning on reaching goal? (it's just so different from what I know about DL)
I think the simulation on PC is about okay now, so I'm planning to deploy it on a real robot to see the physical effect, looking forward to seeing how it delivers next. @reiniscimurs
Hi,
Good to hear that it is working for you as well.
Regarding the "loss" function, it is not the same type of loss as you would encounter in BC or IL tasks where you have a specific dataset that you try to fit your model to and the difference is the loss. Here, calling it "loss" might actually be a bit misleading. See explanation in section 2. Making the loss function in the "You should know" part: https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html
Many thanks :P
I've sent an email to your email adress reinis.cimurs@de.bosch.com, do you receive it? pls do check it, sincerely looking forward to your reply. @reiniscimurs
Hi,
I have sent a reply yesterday.
Have you ever tried using 360 degrees laser data(currently it's 180 degrees laser data and env_dim=20, which means 9 degrees a gap)?
I changed it to 360 degrees range and set env_dim = 40, reserving 9 degrees a gap as before. However, the model keeps going around in circles or just moving little each step, I thought maybe it's becuase of the seed value at first, so I also tried dozens of seed values(from 0 to 10), but it seems no help at all.
1.I was wondering if the doubled env_dim makes the network hard to reference(in other words, the network input becomes larger, but the network itself has such shallow and basic layer components that it cannot deal with the larger input)?
2.Or should I just try more seed values(but how many numbers do I exactly have to try? and how could you be sure of this problem is seed value related or not?)
Looking forward to your reply, thank you for your time and consideration :)
@reiniscimurs
Hi,
Did you also update the gap creation function here: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/velodyne_env.py#L91C18-L91C18
You can see that it is later used in the velodyne callback for creating gaps: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/velodyne_env.py#L135
There might be some mismatch here.
Doubled env_dim should not have too much effect for training in general. I have trained models with 40 laser values as well as lower number of parameters in MLP layers and it was successful. There might be some other parameters, like learning rate, that might need an adjustment, but it should still work. It does blow up the size of the saved state representations though.
1.Yep, I have update the gap to 40 intervals as well, reserving 9 degrees a gap, which means from -pi to pi there is 360 degrees. What's different is that I use 2d laser hokuyo instead of 3d laser velodyne to train, here is my implement.
2.Would you mind telling me where is the learning rate parameter? why I didn't find it...thx a lot :)
3.Do you think by increasing the FOV of laser from 180deg to 360deg or 270deg would help the robot nav to the goal faster and easier? cuz when I deployed the trained-model(using 180deg FOV) to a real robot, when meeting scene like this, it seems has quite uncertainties to reach goal from point A, I guess it is the orientation when at point A that makes a difference.
fig1: if the robot tilts toward the obs, it turns out to turn back and navs to B. fig2: if the robot tilts away from the obs, it just navs directly to goal without turing back.
so I just think, maybe the broaden FOV can avoid fig 1 situation from happening as much as possible...what's your opinion about this?
Looking forward to your reply, thank you for your time and consideration 👍
@reiniscimurs
Looking at it quickly it seems like a fine implementation. Generally, there should not be much difference with going from 3D to 2D laser.
Yes, sorry, learning rate is not exposed as a hyperparameter in this repo. You would set it when creating the Adam optimizer and pass it in as a parameter: https://pytorch.org/docs/stable/generated/torch.optim.Adam.html
I think FOV can help in some sense as you would get more descriptive information about the environment and it is reasonable to try it. Probably adding history through some GRU or LSTM would also bring benefit. However, what I think is happening in this case is that the policy has reached some local optimum and it does not have an optimal way of getting out of it. This is a reason why there is a random_near_obstacle
bootstrapping method, to create more exploration near obstacles for the policy to learn better solutions in the long term. Maybe you need additional bootstrapping methods. Another thing to try is to use cos and sin of angle instead of angle itself in the polar coordinates. Cos and sin do not have discontinuities or "flips" and it might be more descriptive when dealing with states where the angle might be an issue (you can see it described here in eq 3 and 4: https://www.mdpi.com/1999-4893/16/9/412)
Copy that hhhhhh
I will try to set different LR to see the difference, as for the RNN I shall do some research on how to add them properly(guess the design of NN is not that easy, Idk...just the DNN always seems like a blackbox to me, like adding the RNN here or there, the effect is hard to foresee and lack of good interpretability)
Bootstrapping methods and angle representation are somewhat fresh to me haha, this is learn and live LOL
Many thanks for sharing with me those good ideas(idea is the very precious thing in this world, it is just so hard to come up with one, so heartily thx :p), cannot wait to discover new possibilities and see how they works.
@reiniscimurs
There is an implementation of GRU based on this work that uses history in navigation which might be a good start looking into that direction: https://github.com/Barry2333/DRL_Navigation
COOL haha! Thx a lot, it shall be a good inspiration :)
@reiniscimurs
Dear Reinis Cimurs,
During recent model training trials, I encountered somewhat strange phenomena(as the videos shown below. 1st video shows the model begins to swing left and right with little move forward each step, until it exceeds the maxsteps_per_episode, so the episode is done without reaching goal. 2nd video shows the model seems always tend to take a detour to reach the goal while the goal is clearly in front of it without any obs).
I just cannot tell what leads to them, cuz the model generally can reach the goal most times(like 7 episodes/10 episodes), and the goal point in the videos are apparently not that hard to reach as well. Also, the paths it plans are not intelligent enough compared to the source code model.
And what I've changed are: a.2D hokuyo laser with 360 FOV b.Velocity c.reward function d.TIME_DELTA=0.3 e.Adam lr=1e-4(when I set 1e-3, the model always goes in circles, I think it just traps in local optimum without any meaningful learning, so I set a smaller lr and it works)
What's your opinions about it? Sincerely need your guidance. Thank you for your time and consideration 👍 @reiniscimurs
How many epochs have elapsed in this training? In any case the behavior makes sense and is something that i have observed in earlier stages of training. With moving left and right the issue is that there is no history for states. So you end up in local optimum where at one step the best calculated action is to move to the right, and the next one it is to move to left. So you end up in a cycle. Since you do not know that you are in a cycle (lack of history), the policy keeps repeating itself. You can either wait for the training to run longer and hope it learns from this, write a new bootstrapping method that detects a cycle and forces the robot to move or you could increase the random noise value that is added to the action during the training stage. The random noise could then "dislodge" the robot from local optimum and it might learn better performance.
For the second case, it is again the local optimum. Or better explanation would be that the evaluation of the state is such that it "fears" collision more than reaching the goal. At this stage in training the robot evaluates that there is more danger of colliding with the obstacle as that is most likely something that has happened in previous runs where robot collided with an obstacle on the right side. So closeness to obstacle on the right has a negative value. But it is not the same on the left side as the state is different and the evaluation of it will be different as well. As you can see the effect is around the 1m mark from the obstacle which would be consistant also with r3 calculation (unless you have changed it). Since the robot is somewhat slower in your case, you can consider lowering the reward for r3. In any case, finding the right reward function and the right coefficients may solve these issues.
Also, theoretically, the right value of the state should be calculated after seeing each state many times so all you might need is to just train the model longer. You also should not judge the performance too much from training episodes, but rather the evaluation episodes as no noise or bootstrapping is applied there.
Thx for your kind help :P
In this training the model has been trained for ~2500 episodes, so I think there might be little improvement by just training it longer if without other adjustment.
Heartily appreciate your telling me the reasons behind these behaviours so patiently, I shall set about trying those possible solutions to see how they work LOL.
BTW, is the random noise here refers to expl_noise(cuz there're expl_noise and policy_noise, which I think the expl_noise is used to make the robot do the actual actions, while the policy_noise is only for loss calculations)? @reiniscimurs
Hi,
Yes I mean the expl_noise
as that is the value that is added to the calculated action and executed in the simulation. In this repo, this value reduces over time so you might also want experiment with it reducing slower to have more of an effect. (https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/train_velodyne_td3.py#L227C6-L227C6)
Dear Reinis Cimurs,
Heartily appreciate your help, your suggestions have inspire me to try some possible ways. And these days I have tried some different methods, belows are my exps:
I've tried increasing the expl_noise
from 0.34 to 1 to 1.5, setting expl_min
as 0.1, however, the "swing left and right" and "detour" phenomena seem gain no improvement, which makes me so confused... cuz my model's velocity falls in [-0.34,+0.34], I think these noise are big enough to make the model jump out of the local optimum?
As for the bootstrapping method, I try adding the "swing detection code" (as shown below) during training, it does have some effect, every time it detects the "swing left and right" phenomenon, it will generate a random velocity to force the model to make a move, which will possibly change the state, so can the model jump out of the local optimum?
And I also try changing the reward function (as shown below), considering it "fears" collision more than reaching the goal, I try rewarding the model once it gets closer to the goal, vice versa, punishing once it gets away from the goal, at fisrt 1000 episodes it seems okay, while during later traing the model always colliding like a headless fly, like learning worse and worse... I dunno why this will happen...
Idk whether I miss something or these methods are not proper enough? What's your opinions about them, sincerely need your help 👍 @reiniscimurs
Hi
Dear Reinis Cimurs,
So glad to have your kind help.
1.Yep, for the "swing detection code", the random action will just be executed once each time it detects "swing left and right" phenomenon.
Since 0.34 velocity is probably too small to escape the local optimum, it just does not learn. It would need a sequence of constant actions to escape this.
Just want to make sure of it, here do you mean cuz the 0.34 velocity is too small to make the robot see different state, thus the robot cannot output meaningful action from state?
Thanks for your suggestions, I have changed this part code like this:
So far the training has reached ~100 episodes, but I cannot tell if the swing phenomenon can be eliminated now...I will keep training it, hope this will work.
2.I didn't expect this will happen "There is no other choice for the robot than to aim directly at the goal and drive straight there and avoidance has lesser benefit that short term gain of getting that 1.34 reward.
" that is really some deep logic thinking :D
The reason why I try to change the reward function like that is that my robot seems so blind to nav to the goal(not even detours but also like completely ignores the goal), just as the pic shows below.
Now I try changing the reward fucntion like this:
So far I gradually find the DRL is quite unique, cuz one has to speculates what on earth leads the robot acts like this or that based on different phenomena, and how to adjust the code to make it acts the way one expects it learns to. So magic but also challenging hhh
Looking forward to your reply, Sincerely appreciate your guidance <3 @reiniscimurs
Dear Reinis Cimurs,
Thanks for your timely reply.
1.The dist and angle calculations are from the source code, and I haven't changed them yet. Also I print the dist and angle in each step, the values do show the corresponding actions(like once the robot gets away from goal, the dist does become larger).
2."However, the example you provided is a bit weird and I would expect after some time the robot should actually go directly towards the goal. In the original code this does not happen (at least for me) so it is a bit strange."
This also makes me so confused...since my current code only has a few differences compared with the source code:
a.180deg velodyne->360deg hokuyo
b.velocity [-1,1] -> [-0.34,0.34]
c.time delta 0.1 -> 0.3
d.lr 1e-3 -> 1e-4
e.reward function
f.swing detection code added
the c.d.e.f. I made are totally for a.b.
a+b = the model always goes in circles without any learning a+b+c+d = finally starts to learn, but swing and detour phenomena occasionally occur, and the path is not smooth enough as the source code a+b+c+d+e+f <= I'm still working on it, hope the model can act at least as the source code model...
Hard to believe just these tiny differences make so different. Frankly speaking, I'm kind of sad now, but I don't want to give it up. Appreciate your giving me so much support. @reiniscimurs
Dear Reinis Cimurs,
Currently the model has been trained for ~2000 episodes, although https://github.com/reiniscimurs/DRL-robot-navigation/issues/89#issuecomment-1916971711 this version "swing detection code" does help decrease swing phenomenon greatly in training stage, the model still swings left and right a lot in test stage(but not in every episode, this phenomenon occasionally occurs).
Feel like the methods are exhausted but the result is no getting better...
@reiniscimurs
Hi,
Do you by any chance have a repo or similar to look at the full code? Sorry, I am also out of ideas and I don't think I can help that much more without it. I will try to get around to train a model with smaller speed on the weekend if I have time to see if I can get a working model.
Dear Reinis Cimurs,
During recent training I suddenly find there's a fault writing in my reward function, and the swing phenomenon just gone after I correct this fault(how careless of myself, also I'm so sorry for misleading you that much)
However, the model still cannot "see" the goal after the above correction, I haven't founded the reason behind it...
I've sent an email to your mailbox, and the attachment is the full code, very very grateful for your kind help, Looking forward to your reply <3 @reiniscimurs
Hi, I have received the files. I will try to take a look at the program when I have the time.
Dear Reinis Cimurs,
I try to train the model much longer, and its behaviors get a little better, but far inferior to the source code model:
https://github.com/reiniscimurs/DRL-robot-navigation/issues/89#issuecomment-1931936414 (this model has been trained for ~400 episodes)(update here: the swing phenomenon dosen't gone, but only occurs few times during later training, so maybe it has something to do with the reward function but not all. Moreover, with longer training, the detour phenomenon disappears, which means it finally can "see" the goal)
BTW, Do you encounter the same situation with the smaller speed?
Looking forward to your reply, Sincerely appreciate your guidance :) @reiniscimurs
Hi,
I have trained a couple of times with the code you provided. It is a bit difficult to validate the issues here, since the training on such slow speeds takes quite some time. Is there a specific reason why you would want to use such slow robot speeds?
On the code side, I was able to train to the point where it also encounters the swinging issue. It also did not initially want to go to the goal directly but with longer training it did improve. I simplified the velocity capping and reward function to closer resemble the original implementation as in the original repo. I noticed that mostly you cap velocities to 0.34 but at some instances (in case of vel_flag==1) the velocity shoots up quite high and has a huge non-gradual jump in velocity and as such, huge change in reward function. I will train a couple of more times to see what possible issues there might be and let you know if I find something.
Dear Reinis Cimurs,
Is there a specific reason why you would want to use such slow robot speeds?
I'm working on a floor cleaning robot project, that's why I have to cap the velocity in low speed to meet the demands, the project restricts its max_speed < 0.35 m/s
I noticed that mostly you cap velocities to 0.34 but at some instances (in case of vel_flag==1) the velocity shoots up quite high and has a huge non-gradual jump in velocity and as such, huge change in reward function.
By which do you mean maybe the problem is that velocity adjustment code is improper?
Highly appreciate for your kind help, your supports giving me much more confidence to crack these barriers. Looking forward to your reply :p
@reiniscimurs
Hi,
Before going on vacation I also ran some trainings with slower speed and larger FOV of lidar but also could not get a consistently good performance. It would also fall into local minima and "lock up", however it did seem to go to the goal. Unfortunately, I do not think I can invest that much time in training the models and finding a good solution here as I just simply do not have enough time for that. Unfortunately i also do not quite know how to go further here. I would suggest taking iterative steps to getting to where you want and applying only one change at a time. For instance, first train a model with FOV of 360 degrees instead of 180 (which I suspect might be a bit of an issue). Once that works and is confirmed, only then reduce the speed to 0.34. This way we can easier find out what the core problem is.
By which do you mean maybe the problem is that velocity adjustment code is improper?
Not that it's improper, rather it is inconsistent. In most cases, the robots max speed will be 0.34 but in some seemingly random instances it suddenly is 1.5. This seems strange to me. I would suggest using a simple capping of max velocity of 0.34.
Dear Reinis Cimurs,
Cordially thanks for your help.
Recently I have also tried training some models, and my findings are as belows.
thus, I tried to smooth the velocity adjustment, using tanh-like activation function like this to gurantee the consistency(the green curve):
I tried to add LSTM and GRU to the network.
There is an implementation of GRU based on this work that uses history in navigation which might be a good start looking into that direction: https://github.com/Barry2333/DRL_Navigation
let's use some equations here to better illustrate (all of them use 360FOV, 2D laser, lr=1e-4, time_delta=0.3) a. add LSTM and GRU to the network b. velocity adjustment before smoothing c. velocity adjustment after smoothing
a=>all models showed smoother moves
a+b=>I tried training several models with different seed values, and finally I got one model hardly swing(specifically speaking, maybe 2 or 3 times swinging during every 1000 episodes in test stage). So, I created a simple test map to further examine its performance, sadly, it swings a lot then... is this overfitting? why I just change to a simple map, the performance descends so bad https://github.com/reiniscimurs/DRL-robot-navigation/assets/57956361/37e829ec-658e-4309-bb00-16479f9ac7be
a+c=>I tried training several models with different seed values, I do find some of which jump out of the "swing local-minima"(no swing at all). However, they don't swing but they also don't reach the goals(blind again, I even trained to ~8000 episodes)
5.I would suggest taking iterative steps to getting to where you want and applying only one change at a time.
thanks for your suggestions, this shall be the solidest way to figure out where the problem is, "More haste, less speed".
I once trained a model: 180FOV + 2Dlaser+ lr1e-3 + [a_in[0]0.34, a_in[1]0.34] + time_delta0.3. It works okay, except moving too slow...
6.For instance, first train a model with FOV of 360 degrees instead of 180 (which I suspect might be a bit of an issue)
may I ask you why you think this might be the reason?
This seems strange to me. I would suggest using a simple capping of max velocity of 0.34.
I will try this method.Wish there's suprise ahead 👍 Many thanks, Looking forward to your reply :D @reiniscimurs
Hi,
Proably not cause for the swing but it might make the learning more difficult for the neural network. Also consider the angular velocity. In vel_flag=1 case you max angular velocity be 1.5 radians per second. Tanh is probably a better approach but we already have tanh activation output from the neural network. So in some sense you end up applying it twice. My gut feeling is that manipulations with the output velocity are probably not necessary beyond the capping of the minmax range
From the video I'd guess the goal is too close to the wall. I don't think this algorithm would end up going to such a place anyway as it is too "risky". As in, it's Q value is too low.
Maybe the model just does not have enough parameters for a wider FOV or the information is somehow ambiguous for the model. I have no real backing to these claims, but that is just my gut instinct. Unfortunately I will not be able to test it out. Does a 360 FOV model without any other changes work for you? That would be an interesting this to first test out.
Best of luck! Sorry I can't provide more insights
Dear Reinis Cimurs, I recently read your essay titled "Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning", I do appreciate your work in robot path planning using the powerful DRL technology, I believe it would be a valuable resource and guidance for many others' research about DRL.
I've reproduced your work through your github code, everything works just fine until I change the output velocity range to [-0.34,0.34] (in your work linear velocity ranges [0,+1], angular velocity ranges [-1,+1]), which leads to a divergence in loss as shown below.
To solve the loss problem, I also try to adjust the reward function as below, thankfully it finally converges like this.
However, the loss may seem okay but the actual simulation result is not as good as before, the robot may collide 4 or 5 episodes out of 10 episodes, and once the goal is at back of the robot, it seems that the robot dose not know to turn around to navigate to the goal and just go straight to hit the obs bofore it...
Besides, I also try to brutely mutiply a coefficient and tanh() to adjust the output velocity of Actor, and it fails like this:
Thank you for your time and consideration. I really do need your help, I've been stuck here for a week long, it really drives me crazy(sad...)