Defining reward function for an environment

behradkhadem commented 1 year ago

Hi everyone, hope you're all doing well.

As we know, the reward value inside GenericRobot class is statically set to 1. https://github.com/maxspahn/gym_envs_urdf/blob/be7532ae35675c5a2fd8c0d1782e8dbfd684e446/urdfenvs/urdf_common/urdf_env.py#L278 How can I define an environment with custom reward function (using data from sensors and stuff)? Upon cursory investigating the code I came across a class called GoalComposition from a different package, but I didn't see this class be used inside any example. Is this related to this package?

And if not, is there any page or docs for future tasks and roadmap of this package in order to know what are prerequisites to implement reward function inside this package and help it develop? Are there any plans to implement reward function at all?

maxspahn commented 1 year ago

Hi @behradkhadem again,

good to hear from you again. I am not a user of RL nor an expert in reward function design, so you will have to come up with something yourself.

However, it is easy to design a simple sparse reward, or even a dense reward based on the distance to the goal.

Indeed the GoalComposition can be used here, as you have access to the goal position and orientation potentially.

From the top of my head, I would implement in the following way.

goal_position = self._goals[0].position()
cur_position = <pybullet syntax to get current position of the end-effector/robot>
reward = 1/np.linalg.norm(goal_position - cur_position)

It would be really nice if you could propose something here. However, reward shaping is an entire field of research so a generic reward function for urdfenvs seems very ambitious.

Looking forward to hearing from you.

Best,@maxspahn

behradkhadem commented 1 year ago

I think the best way to implement this feature is creating an abstract class (let's name it Reward) that has an abstract method (let's call it getReward() ) that returns an integer. Every time we want to create a new reward function, we should create a class for it and override the getReward() method. For each environment, alongside the list of robots, we can pass the Reward implementation. We can use data of sensors and cameras inside reward object.

In this scenario, user has the freedom of creating and experimenting different reward functions and algorithms. Reward shaping is a crucial step in RL that required lots of trial and error. What do you think?

IMHO, the only con that I see is the fact that we have to change the implementation of GenericRobot and this can cause errors in production if we don't have unit tests.

I know object-oriented programming and design patterns due to my job (as a flutter developer) and haven't really tried to implement abstractions and interfaces inside python (but I know it's possible).

PS: When I was thinking about the solution above, I thought that everything we'll need for calculating reward function is available via sensors and robots lists. If we need some data inside step method of environment, I don't think this solution is able to manage that.

behradkhadem commented 1 year ago

hello? @maxspahn

maxspahn commented 1 year ago

Hi @behradkhadem ,

Sorry for me not being responsive on that. I am on vacation, will look into the OR next week.

Have you checked why the tests fail?

Best, @maxspahn

behradkhadem commented 1 year ago

Ok then, enjoy your vacation. And tests failed because I've changed the implementation. 😅 But haven't checked it TBH and I was planning on doing that after solving the issue with sensors.

maxspahn / gym_envs_urdf

Defining reward function for an environment #179