islambarakat99 / Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning

A leader-follower formation control using deep reinforcement learning environment, In which every agent can learn to follow the leader agent by keeping track of a certain distance to that leader, avoiding obstacles, and avoiding collision with the other agents.
MIT License
30 stars 6 forks source link

Action space issues #7

Closed syf980302 closed 1 year ago

syf980302 commented 1 year ago

Hello, I'm sorry to bother you again. I still have some questions that I need to consult with you and I need your help.

Question 1: You said you set the maximum speed of the intelligent agent to 0.2, but I can't find the specific code for this part.

Question 2: I don't know where the action space is defined, and I don't know where the settings for this part are.

Finally, I'm sorry to bother you again, but I really need your help. I wish you all the best.

syf980302 commented 1 year ago

Hello, I really need your help. If you are free, I hope you can provide some clarification. Thank you.

islambarakat99 commented 1 year ago

Hello,

Q1: You can find the maximum speed of the intelligent agent at https://github.com/islambarakat99/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning/blob/4fde6a0a951394298a15a8e26bcdb57f819cbea8/multiagent/core.py#L42

Q2: Actually the definition for the Action Space is pretty hard to be described here, so you can find this one at: 1- The original paper of MADDPG

@article{lowe2017multi,
  title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments},
  author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor},
  journal={Neural Information Processing Systems (NIPS)},
  year={2017}
}

2- The implementation can be found at this file at https://github.com/islambarakat99/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning/blob/4fde6a0a951394298a15a8e26bcdb57f819cbea8/maddpg/trainer/maddpg.py#L1

3- Also you can read this article for further explanation

If you have further questions I can help

syf980302 commented 1 year ago

Okay, thank you very much for your answer.

I don't think my coding skills are strong enough, so I feel a bit inadequate in many aspects. I have changed the reward you set for training and found that the effect is very poor, and basic navigation tasks cannot be completed. I can only continuously adjust the reward value for training, which is the only thing I am currently able to do and have the confidence to do well.

I have another question that I would like to ask you, which is how to display the results of training and testing in a graph. I have written a piece of code myself, but I always report errors.

Finally, thank you very much for replying to my email. Wishing you a happy life and all the best.

islambarakat99 commented 1 year ago

For the first part, I think that you can also tune these parameters, along with the reward settings https://github.com/islambarakat99/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning/blob/4fde6a0a951394298a15a8e26bcdb57f819cbea8/train.py#L21-L24

It is also recommended to have some basic understanding of these parameters' effect on the performance, for this matter you can read the article mentioned before. I found it to be very helpful and quite easy to understand.

Second, you can set these arguments to display the results after training. And also the benchmarking, so you can basically have a trained agent with a certain policy then you want to evaluate its performance so you tune lines 33, 34 by enabling the benchmark and set the number of iterations you want to evaluate your agent within.

https://github.com/islambarakat99/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning/blob/4fde6a0a951394298a15a8e26bcdb57f819cbea8/train.py#L31-L36

At lines 35, 36 you can set the directory on your machine to save the plots as graphs after finishing

I hope this will help you, I am happy replying to your comments 😀

syf980302 commented 1 year ago

Ok, thank you. I have already started to adjust the parameters you mentioned. Regarding the content in the figure below, I will study and try again. If there is any progress, I will inform you in a timely manner.

I am very grateful for your reply. I may come to you every day recently to inquire about something that has caused you trouble.

Wishing you happiness and a smooth life every day. image

syf980302 commented 1 year ago

Hello, I remember. After I trained well at that time, I stored a graph about rewards, but the suffix of the graph was a pkl file, and I still don't know how to draw it using Matplotlib.

As shown in the following figure. image

islambarakat99 commented 1 year ago

So First of all you need to install Matplotlib

pip install matplotlib

Then you need to re-use this code which I found for your problem, I also have tested it for you

import matplotlib.pyplot as plt
import pickle as pl

# Load figure from disk and display
fig_handle = pl.load(open('formation_agrewards.pickle','rb'))
fig_handle = pl.load(open('formation_rewards.pickle','rb'))
plt.show()

You can train the model as it is and run this piece of code from a separate file, but take care that you need to place this file with the pickle files or instead you can place it anywhere and give it the right path to the pickle file and have fun!

You don't need to thank me, it is just a simple help. You can ask me when you are stuck with something

syf980302 commented 1 year ago

Hello, I tried according to the code you provided, but there was an issue. The code did not report an error but did not display the image, as shown in Figure 1. I searched online for related issues and have tried all the existing methods, but the results are still the same.

I tried the following solution and added a new line of code, as shown in Figure 2. But the result is not good, as shown in Figure 2.

At present, I am a bit tricky about this issue, and I apologize for bothering you again amidst your busy schedule.

0b6a76199edf1526c437d7a1e860422 2612428e1d096679e9303660d7edb81
islambarakat99 commented 1 year ago

Well, the line you added unfortunately will not work!

So when you use the first code you should see Matplotlib graphs pop-up, however this may not work due to one of some reasons.

First, can you tell me the version of Python and Matplotlib you running, on Command Prompt write:

python --version
pip show matplotlib

Second, you may try to install GUI kits for Matplotlib like pyqt5:

pip install pyqt5
syf980302 commented 1 year ago

Hello, I have already carried out the operation according to what you said. And 'pyqt5' has been installed.

But when I run the code, I still can't display the diagram.

2e1c7a2c8fd9b4b2c4db6a3def68ff7

The environment I created using anaconda is called maddppg.

syf980302 commented 1 year ago

Hello, I have a guess. I think the pkl files saved during training cannot be drawn using Matplotlib at all.

I don't know if this idea is right or not. What do you think?

syf980302 commented 1 year ago

Hello, I just solved this problem.

Although the figure can be displayed, some are not quite correct. Because it's already 23 o'clock here, I will study it carefully tomorrow. If I have any progress, I will contact you as soon as possible.

Finally, thank you very much for your help. Thank you. I hope you don't mind me and wish you a happy life. image The average rewards correspond to the formation_agrewards. The Average Agent Rewards correspond to the formation_rewards.

islambarakat99 commented 1 year ago

I am really happy that it worked 😀 , Please update me with your progress and ask for help when needed.

Wishing you best of luck

syf980302 commented 1 year ago

Hello, I found a problem with the code I wrote.

Question 1: Normally, all the training processes should be displayed, but the curve I drew only shows a portion.

Style 2: Each agent should have its own rewards. I believe that 'final_ep_rewards' saves the rewards each agent receives in each episode, but no matter how I set it, only one curve comes out.

I'm sorry to bother you again, but I really don't know how to handle it. I tried all the methods I could think of, but it still didn't work. Below is the curve and code I drew.

image image

I would like to copy the code I wrote to you for your review.

Finally, thank you very much for your help. I wish you a happy life and all the best. `import matplotlib.pyplot as plt import pickle

Load the pkl files

with open('C:/Users/309/Desktop/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning-main111/curves/formation1_agrewards.pkl', 'rb') as f: rewards = pickle.load(f)

with open('C:/Users/309/Desktop/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning-main111/curves/formation1_rewards.pkl', 'rb') as f: agrewards = pickle.load(f)

Convert non-iterable elements to single-item lists

rewards = [[rew] if not isinstance(rew, (list, tuple)) else rew for rew in rewards] agrewards = [[rew] if not isinstance(rew, (list, tuple)) else rew for rew in agrewards]

Calculate average episode rewards and agent rewards

avg_rewards = [sum(rews) / len(rews) for rews in rewards] avg_agrewards = [sum(rews) / len(rews) for rews in agrewards]

Plot the average rewards over episodes

plt.plot(range(len(avg_rewards)), avg_rewards, label='Average Rewards') plt.plot(range(len(avg_agrewards)), avg_agrewards, label='Average Agent Rewards') plt.xlabel('Episodes') plt.ylabel('Rewards') plt.title('Reward Curves') plt.legend() plt.show()`

islambarakat99 commented 1 year ago

It may be due to one of some reasons: 1 - It may be the Pickle file saved has only 1 agent for rewards/agrewards. 2 - It may be because you have not divided each of rewards/agrewards into sub-lists for each based on the number of agents you have for each one, it may also explain why you see the reward curve lengths are not consistent with each other on the graph. 3- It may be because you need to use plt for each agent separately.

syf980302 commented 1 year ago

Ok, thank you. I completely followed the code you wrote at the time. I only modified the previous parameters, such as learning rate, and did not make any other changes.

I don't know how to write this part of the code now.

islambarakat99 commented 1 year ago

Can you share with me your 2 pickle files here at this website, please upload them here and share the link with me! Once I have time I will try to figure out why they are not working with you, in the meanwhile you can also search the internet for solutions to this problem and If you find any, tell me! I will update you too.

syf980302 commented 1 year ago

Of course, thank you very much for helping me. I will now send you the '. pkl' file that I have run. I will compress all the programs and send them to you. The '. pkl' file is located in the 'curves' folder. Due to my lack of understanding of some of the code, I added a note in Chinese.

Finally, I would like to express my gratitude to you again. Thank you for taking the time out of your busy schedule to reply to me. I wish you a happy life and all the best.

I will also continuously search for information and debug the code. Once there is any progress, I will immediately share it with you. Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning-main111.zip

syf980302 commented 1 year ago

The compressed package I uploaded is in "zip" format. I'm not sure if you can fully open it on your end? If you have not received it or cannot open the file, I will immediately adjust the format and resend it to you.

Thank you, my good friend.

islambarakat99 commented 1 year ago

Can you upload only the pickle files at this link and generate a link then send this link here, because I am not sure that the environment will be fully working at my end right now

syf980302 commented 1 year ago

Hello, due to personal reasons, I am only replying to your email. I'm not sure if this is the correct way to handle it. I followed the link you sent and generated a website address on it. Here is the pkl file I obtained. https://wormhole.app/Qx7OB#MSh8Zadr0pw7E_YPqno-Hg If there are any issues, I will handle them promptly. Thank you, my good friend.

syf980302 commented 1 year ago

Hello, there is one thing that needs to be noted. This link is only valid for 24 hours. If it fails, please let me know and I will regenerate the link before sending it to you.

syf980302 commented 1 year ago

Sorry, I made a mistake in one sentence. What I want to say is that due to my personal reasons, I have not responded to your email until now. I hope not to cause your misunderstanding. Thank you very much for replying to my email in your busy schedule. I am sorry to cause you some trouble. Finally, I wish you a happy life and everything goes well.

islambarakat99 commented 1 year ago

Sorry bro, for not replying immediately I am just on a vacation for 2-3 days, so I don't have the laptop with me to do it now. Once I am back I will send you to give me a new link to look at your pickle files and send you.

Don't worry, I hope that those personal reasons went fine.

Don't thank me again I am just giving too little help here.

I hope that your project will be at the end a very successful one ☺️

syf980302 commented 1 year ago

Ok, thank you. I am also actively searching for information and solving this problem.

Have a pleasant time.

浅握双手 @.***

 

------------------ 原始邮件 ------------------ 发件人: "islambarakat99/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning" @.>; 发送时间: 2023年4月21日(星期五) 上午10:32 @.>; @.**@.>; 主题: Re: [islambarakat99/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning] Action space issues (Issue #7)

Sorry bro, for not replying immediately I am just on a vacation for 2-3 days, so I don't have the laptop with me to do it now. Once I am back I will send you to give me a new link to look at your pickle files and send you.

Don't worry, I hope that those personal reasons went fine.

Don't thank me again I am just giving too little help here.

I hope that your project will be at the end a very successful one ☺️

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

syf980302 commented 1 year ago

Hello, I don't know about your holiday life, but I think it must be happy and full of fun. Of course, I also have a happy thing to share with you. It's just that I feel like I've solved the problems I left behind. Idea 1: The reward curve shows only part of the reason for the "arglist.save_rate", for example: If "arglist.save_rate" is 10, the first element in the final_ep_rewards list is the average cumulative reward value from episode 10 to episode 1. Idea 2: '_agrewards.pkl' does hold an average reward value for each agent. I printed out the values included and then divided the elements to get three curves. I don't know if I'm right about that, but I think it's a little bit of progress, ha ha ha.

image image

However, there is still a problem, that is, the curve obtained by training does not feel right, that is, there is almost no increase in the reward value base of agent 1 and Agent 3.

islambarakat99 commented 1 year ago

Hello there, I am really sorry I haven't been following up recently as I had a vacation, so sorry for not being available!

I am really happy to did it, I think that's a lot of progress to be able to solve such problems and print the right graphs.

I think you can play with the learning parameters I mentioned for you before, they are the main controllers of the learning process hence the learning curves: https://github.com/islambarakat99/Multi-Robot-Formation-Control-using-Deep-Reinforcement-Learning/blob/4fde6a0a951394298a15a8e26bcdb57f819cbea8/train.py#L21-L24

As I mentioned before, you need to tune these parameters by trial and error. Then update me with your progress

syf980302 commented 1 year ago

Ok, thank you. My current main job is to adjust the parameters in order to obtain a model with better performance.

Thank you very much for your help. I wish you a healthy and happy life, and all the best.