Closed oycool closed 2 months ago
Hi,
Please provide details. I won't know anything without any information.
Thank you very much for your reply. I have solved this problem. I have one more question. I trained 1000 episodes and generated files in 'pytorch_models'. If I restart training, will the files generated by these training be overwritten or will the data continue to be written?
Yes, they will be overwritten.
Thank you for your answer.
Hi Reinis Cimurs,I've run into a problem again.
I changed the car model to an Ackermann car and used a single-line lidar. After I modified the reward function, at the beginning of training, the car would drive for a long time in one epoch before it collided and was slow. However, after 663 episodes of training, the car quickly rushed towards the obstacles, and Max_Q and Loss also fluctuated greatly. Is this a normal phenomenon?
I added the number of steps the car took in an episode and its proximity to the target into the reward function. When the number of steps is >350, a penalty will be given. When the distance is <1, a reward will be given when approaching the target.
Are you giving the timestep information in the state for the model?
If not, then think about how would the model be able to learn the value of state-action pair? Consider 2 exactly identical state-action pairs but one happens on step 100 and the other on step 400. The model needs to estimate the value of this state-action but at both cases the reward you return is different. It just does not have information on what changed to be able to apply gradient. It is just noisy reward at that point. So consider if you need this kind of reward formation.
If steps are taken into account, do I need to add 'episode_timesteps' to the 'state' array and then pass it to the model to get 'action'?
I would expect so. I guess you are trying to force the policy to go to the goal as fast as possible, but I would think about it a bit if this is a good approach.
Yes, because I trained for 40 epochs and 3000 episodes before, but still couldn't reach the goal.
Generally, it should work right out of the box. If bot, you could try to change the seed value and have different initialization weights
OK, thanks for your reply.
Hi Reinis Cimurs, first of all, thank you very much for sharing. I made some modifications to the model. When training for 14 episodes (that is, collision or reaching the target), the action becomes Nan. It's difficult for a newbie to modify it, can you give some suggestions to solve the problem based on your experience? Looking forward to your reply.