got issue with the self driving

koi0823 commented 7 months ago

i have be try to record a reward on track Screenshot 2024-04-26 185119 Screenshot 2024-04-26 185200

Later, when I attempted to run the check environment, all of the errors disappeared. I'm not sure whether this is a bug or what, but it kept running on the respawn location.

yannbouteiller commented 7 months ago

Oh dang, have I introduced a bug in 0.6.3?

The fact that it only captured two points is what causes the issue. Why did you close the issue, is this somehow fixed?

If not, are you using Windows or Linux please? And are you using version 0.6.3 of tmrl?

koi0823 commented 7 months ago

the problem above is solve cus i just notice CMD run more useful then VSC firstly i run it on CMD

I'm using a window, and I believe there is a parameter issue since if I keep running, the resolution will change to 256 X 128; I'm not sure why.

koi0823 commented 7 months ago

i Screenshot 2024-04-27 024656 did fix the height and width

koi0823 commented 7 months ago

Screenshot 2024-04-27 024909 so this is my version yes is 0.6.3

yannbouteiller commented 7 months ago

I tested 0.6.3 for both --record-reward and --check-environment on Windows 11 and both worked as intended on my machine.

Your output for --record-reward is extremely strange, it looks like the car instantly teleported to the finish line. The "initial number of captured positions" should never be 2.

(For the yellow warnings returned by --check-environment, you can safely ignore them. We should fix this at some point but this is minor and somehow only seems to happen in --check-environment for reasons that I don't understand.)

koi0823 commented 7 months ago

I'm utilizing the full environment to run it, so is it because the road isn't balanced that I can't record the award and check the environment?

i run it with the map spring 2024 - 01

koi0823 commented 7 months ago

maybe is the dent of the road that why the lidar cannot scan ?

let me try other map any map u recommended ?

i doing my FYP as a AI degree so i finding try to do it with a good result so i can present it

yannbouteiller commented 7 months ago

No you should be fine, this has nothing to do with the lidar. I recorded a reward on the 1st Spring 2024 track myself and it worked properly.

Something seems wrong with your OpenPlanet installation, as if you were receiving only 2 positions while recording the reward for some reason.

Maybe some external program is sending trash on port 9000?

To record the reward, you are supposed to set the car at the beginning of the track, press e, and then drive to the finish line. When you cross the finish line, you should see a message telling you that it has captured a few thousand initial positions (your problem is that it only captures 2 positions for some weird reason).

koi0823 commented 7 months ago

Yeah, but I have to leave for CMD.

Could you please tell me where the file or data went after the record reward so I can verify and see?

Could I please inquire what the record reward and check environment are? Because when I train, it's becoming worse.

Where is the file located so that I may check?

koi0823 commented 7 months ago

Screenshot 2024-04-28 192936 Could you please tell me the loss_actor? Has it been worse and worse since I'm not sure whether it's getting better at training?

koi0823 commented 7 months ago

INFO:root:=== epoch 42/10000 = round 76/100 ==================================== INFO:root:memory_len 293223 round_time 2.991769 idle_time 0.0 loss_actor -0.67442 loss_critic 0.524971 return_test 0.0 return_train 47.25 episode_length_test 0.0 episode_length_train 127.0 sampling_duration 0.006929 training_step_duration 0.008019

I'm training it in its full environment.Would it be preferable to use the entire environment or only lidar?

I'm new to this field, thus I'd like to know how to train for multiple sessions at once. I'm not sure how long to train for, and I scare cant make it for my FYP. nearing sep 2024

yannbouteiller commented 7 months ago

LIDAR is faster and easier to train, but it only works on plain grey road with black borders, like the tmrl-train track. Full is more general and works on any track. It is however harder to find good hyperparameters with Full, and training takes a long time / requires a high-end GPU.

Everything is located in the TmrlData folder, indluding the reward. However, this is a pickle file, so exploring this file will not help you debug, unless you unpickle and explore its content in a python script. But from your logs, what you would see if you were doing that would probably be a list of 3D points forming a straight line between your 2 initial captured positions.

What you can do for debugging is delete the TmrlData folder entirely, execute python -m tmrl --install (this will recreate the TmrlData folder in its default state), and try the pre-trained AI on the tmrl-test track using python -m tmrl --test. If the AI works properly, OpenPlanet is sane and something is wrong in the way you recorded a custom reward.

(Note that, for the default AI to complete the tmrl-test track, the camera needs to be in the exact configuration shown in the getting-started page, which may not be your default camera configuration)

koi0823 commented 7 months ago

Regarding the third column you said, I believe it should be alright as I ran it well and deleted it once beforehand. I used VSC to execute the Python -m tmrl --test works good, but the reward record appears to be stuck—possibly because I kept pushing E—so it isn't recording.

in other way i use cmd to test the python -m tmrl --record-reward is all fine

i running with GPU 3080 and my CPU is i5- 13600k

if can add my discord i might show u if u have sometime cus i new to this need some guide from u

discord -- koihaha#5605

koi0823 commented 7 months ago

Screenshot 2024-04-29 002943

idk is it okie or not but is training

yannbouteiller commented 7 months ago

Oh yes, you don't want to keep pushing e, just press it once at the start of the track, then drive normally to the end, and when you cross the finish line the script should automatically compute the reward function from positions it captured in between.

Are you using the Full environment? If you are, it should automatically rescale the trackmania window to something smaller (unless you manually changed the corresponding parameters in config.json).

If instead you are using the Lidar environment, you need to set the camera in 1st person view so that you don't see the car. You can use python -m tmrl --check-environment and drive around to see whether the observations and rewards make sense before starting to train.

koi0823 commented 7 months ago

can i add why my memory_len is 1000000 and wont add anymore

INFO:root:Memory updated with steps:200, batch size:256, memory size:100000000. then this error when i add two more 0 behind the memory size

and ya i want to change my tmrl to another path file how ya ?

koi0823 commented 7 months ago

yannbouteiller commented 7 months ago

We could easily add an option for changing the TmrlData path but this is not done atm. If you want do do this, you will have to clone the repo, and manually change the value of TMRL_FOLDER here.

You can install your local version of the repo by cd-ing where the setup.py file is and doing pip install -e .

koi0823 commented 7 months ago

hey i have a question for the time_step_timeout_factor cus i set it 80 but it will stop when is 48 second

koi0823 commented 7 months ago

yannbouteiller commented 7 months ago

Wow, don't do this. The timestep timeout factor should not exceed 2.0, otherwise it becomes entirely meaningless.

Closing as this is not a tmrl issue. Please open a thread in the discussions section for help and questions.

koi0823 commented 7 months ago

Wow, don't do this. The timestep timeout factor should not exceed 2.0, otherwise it becomes entirely meaningless.

Closing as this is not a tmrl issue. Please open a thread in the discussions section for help and questions.

u mean the timestep timeout factor should not exceed 2.0 keep respawn and respawn so i chatgpt it and get this ans

koi0823 commented 7 months ago

@yannbouteiller then how should i set my time and second cus is weird keep respawn then didnt stop

yannbouteiller commented 7 months ago

Respawns are not related to the timeout factor, they happen because the agent is failing to collect reward.

It probably fails to collect reward because your environment has an issue. You need to use python -m tmrl --check-environment to find out what.

koi0823 commented 7 months ago

@yannbouteiller thx bro i will take a look

a very quick question can i change my memory_size around like 2000000

i checked my enivornment is ntg wrong but the time will stop at 50 second

koi0823 commented 7 months ago

https://youtu.be/2NkFNORkdD0?si=S3krB2NBnP2FN1P2 video on 2.10 time is the place i located and is 50 second

@yannbouteiller check this out i think my environment is fine but idk why is keep stuck at 50 second

koi0823 commented 7 months ago

i didnt change anything on the config

yannbouteiller commented 7 months ago

https://youtu.be/2NkFNORkdD0?si=S3krB2NBnP2FN1P2 video on 2.10 time is the place i located and is 50 second

@yannbouteiller check this out i think my environment is fine but idk why is keep stuck at 50 second

The 50 seconds limit is expected, it is the default time-limit in the example TrackMania pipeline. Your environment looks sane from your screenshots.

koi0823 commented 7 months ago

for the default time-limit in the example trackmania pipeline how can i change it or like fixed

yannbouteiller commented 7 months ago

You can change it in config.json by changing the values of both the "ep_max_len" and the "RW_MAX_SAMPLES_PER_EPISODE" entries.

koi0823 commented 7 months ago

It's good that the time has changed, but how can I make the parameter more better? because it continues to tremble

1.is is my record_reward problem or not enought train ?

if i get a new train map https://trackmania.exchange/maps/25138/a-little-kart-ride

is the road is enought for the space to scan ?

yannbouteiller commented 7 months ago

The jitter is inherent to how Soft Actor-Critic trains policies. If you want to get rid of it, one solution is to penalize large changes in the steering in the reward function. This would require you to learn Python programming, clone the repository and adapt the environment' code.

Otherwise you can play with the SAC hyperparameters in config.json, in particular the "alpha" term which is responsible for injecting entropy in the policy.

koi0823 commented 7 months ago

mean i need increase the value ? to like 0.05 or like less?

yannbouteiller commented 7 months ago

If you want to see less jitter, you should decrease it I believe, but this will also harm exploration.

koi0823 commented 6 months ago

can i ask what is loss_actor ?

yannbouteiller commented 6 months ago

You need to read the Soft Actor-Critic paper to understand this.

trackmania-rl / tmrl

got issue with the self driving #91