When are the chosen actions supposed to be executed?

krumiaa / MindMaker

MindMaker UE4 Machine Learning Toolkit

255 stars 31 forks source link

When are the chosen actions supposed to be executed? #9

Closed FLOROID closed 2 years ago

FLOROID commented 2 years ago

Hello there ^^

I am currently trying to set up my AI Agent for Autonomous Driving. I've set up my Make Observations Functio and the Check Reward function and I am now wanting to bind the actions chosen by the Algorithm to their respective actions in unreal engine. For example having the action with the iterator 1 increase the throttle of the car. Now I'm trying to figure out when to do that. Do I execute those actions inside the Receive Action function before Observations are made using the "Action Selected" Integer Value? Also, does the range of values for Action Selected start at 0?

And as a final question - does the Mindmaker executable automatically call the Receive Action function or do I have to call it anywhere because I couldn't find any place in the example project where the function was being called?

Best regards,

Flo

krumiaa commented 2 years ago

Actions will automatically begin executing once the launch mindmaker function commences, eg at the beginning of gameplay. The executable itself will start choosing random actions according to whatever action space you specified in the launch mindmaker function, so you will want to define this exactly as you want your actions to be received. Actions are then received in the receive actions function, where you can define how those actions will actually show up and be displayed in game play. Before that it will just be some numeric representation in an array, as you defined in the action space variable of the launch mindmaker function. The Mindmaker executable doesn’t itself automatically call the receive actions function, but in all the blueprint examples Ive made there is some initial nodes immediately prior to the launch mindmaker blueprint function which define the conditions for calling receive action function, so that once the mindmaker executable starts generating actions, these automatically trigger the receive actions function to start. So in effect it gets automatically done once the executable begins producing actions.

Once you get farther along with your self driving game, would you be open to sharing it as a mindmaker example? I think it would be really useful for others trying to get their hands dirty with this stuff.

FLOROID commented 2 years ago

I apologize for the delayed reply.

I'll have a look into my blueprint functions in a bit to see if I can find my error, because at the moment, the MindMaker executable only manages to establish a connection to UE, however nothing else happens yet after that point.

A few quick questions: The Define Action Space function merely sets the Max Actions Integer Value right? And for the Observation Space Shape, your article here [https://towardsdatascience.com/create-a-custom-deep-reinforcement-learning-environment-in-ue4-cf7055aebb3e] reads "say these will range from -100 to 120 so enter “low=np.array([-100]), high=np.array([120]),dtype=np.float32”" ,however in the example project this format is used low=np.array([0,0]), high=np.array([100,100]),dtype=np.float32. I'm not very familiar with np, so I'm not sure if this is a 2-dimensional array or if it includes how many Perceptrons the Network has.

Also - regarding your question: I would be more than happy to do so. I'm documenting the process of making my own Blueprint Actor using the example project. I am also using Unreal Engines Vehicle Template Content, so that it's easy to replicate. Once I'm done with my paper I can write an english documentation that you can reference together with the project, which I will happily supply to you :)

krumiaa commented 2 years ago

Thank you for being willingness to share about your project and help others learn these tools!

If there’s a problem connecting to MindMaker, I would look at the network setting, make sure your using the appropriate port in the blueprints tcp/ip settings and that the launch mind maker function is initialized correctly.

The Define Action Space sets the entire range of values that actions can take, including decimal places. If you’re using discrete action space, than it will be defining a min and max integer value.

Your correct that the example is using a two dimensional array in regard to observation space, with 0 as the lower bounds and 100 as the upper bounds in this case. In essence we are defining a custom openAI gym environment, and the np format refers to python syntax. You can read about the details of custom environment in openAI gym, should help in terms of defining action and observation space shape etc. https://blog.paperspace.com/creating-custom-environments-openai-gym/

If you need more hands assistance, I also do consulting work on machine learning and game AI. you can reach out to me at aaron.krumins@gmail.com

FLOROID commented 2 years ago

I believe the connection to MindMaker is correctly being established. This is the Output, that I can see in the executable. After that, nothing else happens.

Is the Action Space Array ever passed to Mindmaker and therefore relevant to it? And what exactly is it used for inside the Receive Action function? It's a little hard to decipher for me.

As for the Observation Space Shape, I assume the syntax works like this: action 0 has low bounds in the low array at the iterator 0 and high bounds in the high array at the iterator 0. Meaning that for let's say 3 actions with low bounds of 0.0f and high bounds of 100.0f the string would look like this low=np.array([0,0,0]), high=np.array([100,100,100]),dtype=np.float32 Is that correct?

Another small question would be, if the number of pins on the set members in custom stuct node influences the message that is being generated by the Json Object. In the example project (inside the Send Observations function) it looks like this

however in my own project, the Set members in MindMakerCustStrucStart functions automatically splits into all the available pins. Does that influence the result in any way? And if so, does simply removing the pins change anything about that?

FLOROID commented 2 years ago

I still need help with this :)

krumiaa commented 2 years ago

Action space, observation space etc are all passed to mindmaker learning engine and are essential for configuring the environment. Refer to the OpenAI gym documentation and stable baselines documention for more on each of those. https://stable-baselines.readthedocs.io/en/v2.10.0/guide/custom_env.html The number of pins doesnt influence the results, you can choose to show the available pins as a setting within UE, thats all. Since only reward and observations are passed back to the learning engine on a reoccurring basis, those are the only pins displayed in the example.

FLOROID commented 2 years ago

Hello Aaron,

I did have a look at both the stable baselines documentation and the OpenAI Gym documentation which was very insightful and helped me a lot in understanding the fundamental workflow of OpenAI Gym. I'm still dissecting a lot of it because I'm new to all the terminology that is used in the docs for the algorithms and policies that OpenAI Gym supplies. Sadly the major thing holding me back is that after MindMaker has been launched and successfully connected to Unreal Engine none of the functions such as "MakeObservations", "CheckReward", etc. are being executed. I tried replicating the cartpole setup as much as possible and tweaking it to work for my self-driving car agent. The only fundamental difference being, that I've integrated the entire blueprint setup in the Agents own BP class and not in an AI Controller BP. To my understanding this should not make a difference. Do you have any idea what I may be doing wrong? Maybe some common mistakes?

Feel free to ask for further information, screenshots, etc. I'm more than willing to supply you with everything you need in order to help me figure this out.

krumiaa commented 2 years ago

Hi Florian,

I know the learning curve can be pretty steep when you're just getting started with this ML stuff, and the openai and stable baselines documentation is pretty dense. You could also try a book like this to get started.

https://www.amazon.com/dp/1839214937?tag=uuid10-20

As for your particular issues, since youve moved out of the AI controller BP, I would make sure all of the socket IO plugin stuff has been added to your blueprint class and the json based communications are working properly. Once mindmaker has been successfully launched, it should immediately begin sending actions to unreal engine. As such, I would make sure all the bindings immediately following the event begin play in the AI controller BP example are present and functioning. This might be a laborious process of just going through each function(receive action etc) one by one and making sure the basic communications functionality is working, then moving on from there.

Not sure what kind of priority you have on this, but if you get stuck and are considering bringing in outside help to go through your code, I occasionally consult on such projects. You can email me at @.*** if you're interested.

-Aaron

On Mon, Apr 4, 2022 at 11:55 AM Florian Dittrich @.***> wrote:

Hello Aaron,

I did have a look at both the stable baselines documentation and the OpenAI Gym documentation which was very insightful and helped me a lot in understanding the fundamental workflow of OpenAI Gym. I'm still dissecting a lot of it because I'm new to all the terminology that is used in the docs for the algorithms and policies that OpenAI Gym supplies. Sadly the major thing holding me back is that after MindMaker has been launched and successfully connected to Unreal Engine none of the functions such as "MakeObservations", "CheckReward", etc. are being executed. I tried replicating the cartpole setup as much as possible and tweaking it to work for my self-driving car agent. The only fundamental difference being, that I've integrated the entire blueprint setup in the Agents own BP class and not in an AI Controller BP. To my understanding this should not make a difference. Do you have any idea what I may be doing wrong? Maybe some common mistakes?

Feel free to ask for further information, screenshots, etc. I'm more than willing to supply you with everything you need in order to help me figure this out.

— Reply to this email directly, view it on GitHub https://github.com/krumiaa/MindMaker/issues/9#issuecomment-1087730653, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABI6N2NMKSHZTQLJQ75Q7ETVDMGIVANCNFSM5MEPLPYQ . You are receiving this because you commented.Message ID: @.***>

FLOROID commented 2 years ago

Hi Aaron,

I've gotten my hands on the book you recommended and will be digging into that :)

As for the current issues: I had an extra look through my setup in the Event Graph.

Begin Play executes the Mind Maker Windows function just like it does in the cartpole example, which works just fine. The executable launches properly. On Connected(SocketIOClient) executes the "Bind Event to Function" functions which are configured the same way they are in the cartpole example The related BP functions are named accordingly The connect function connects to http://localhost:3000 , which is confirmed in the executable and therefore seems to work just fine. I added a SocketIOClient component to the Blueprint, which is also configured the same way it is in the cartpole example.

As for the Launch Mind Maker function, I currently use this setup: My MakeObservations function should spit out 6 float values from 0 to 1 and for the action vector I expect an integer value corresponding to the action that should be chosen. I assume the action space should simply be the integer value equal to the number of possible actions?

Regarding your suggestion to double check the "basic communication functionality". What exactly do you mean by this and how would you go about checking this?

Last but not least - I really appreciate your offer for consultation and I would be very happy to take you up on that offer. First, however, I would like to exhaust all other options :)

krumiaa commented 2 years ago

The first thing that should happen when the mindmaker executable is launched is it will begin sending actions to blueprints, and if successful, these will show up in the receive action function in UE. So by testing the communications, I mean put in some kind of print node that spits out whatever action is being received, that way you can look at it and makes sure it makes sense in the context of your defined action space. Then go from there, checking that UE is sending o observations and rewards back to mindmaker executable, which if successful will trigger another action to be sent back. I think you may be hitting a snag in how your defining your action space. You have selected a discrete action space, which means the action space should consist of single integer, comprising the number of total actions the agent can take. So you could just put in 7, if you wanted 8 actions(0 is its own action). I have yet to find a good tutorial on creating custom environments in Open AI gym, but this is where that information would be covered. You could try searching the same.

FLOROID commented 2 years ago

Thank you very much for the tip regarding the action space shape, I now have an input of 5 into that node for my 6 potential actions.

I have made some progress and managed to get it to fire the first action! The Emit Node for the launchmindmaker event (shown below) wasn't firing. I will include this in my documentation as something to check if someone runs into the same issue :)

I'll try to sort out what comes after now and I'll keep you posted :)

UPDATE: Inside the Receive Action function I found this "Action Received" boolean, however this node is the only reference to that variable and it doesn't appear to be used or set back to false after being called in the Receive Action function. Is this in any way relevant to Mind Maker working or just there in case the user wants to display it in any way?

The second question I have is regarding the other function calls inside Receive Action: To my understanding, once Receive action is called, a selected action contained in the message node (in my case simply an integer value) is passed to the Blueprint. Inside the Receive Action function I'm supposed to execute said action (for example: set throttle input to 1 if action number 0 was received, etc.) Then the Make Observations function needs to be called and pass these observations into the Observations string variable in the format [float,float,float] The Explore Counter is increased by 1. Then the reward is checked and saved as a float value into the Reward variable and Send Observations will be called by MindMaker automatically to retrieve the state(observations) and the reward using the custom struct.

Please confirm or correct me if I have anything mixed up.