Closed behradkhadem closed 1 year ago
Hi @behradkhadem again,
good to hear from you again. I am not a user of RL nor an expert in reward function design, so you will have to come up with something yourself.
However, it is easy to design a simple sparse reward, or even a dense reward based on the distance to the goal.
Indeed the GoalComposition
can be used here, as you have access to the goal position and orientation potentially.
From the top of my head, I would implement in the following way.
goal_position = self._goals[0].position()
cur_position = <pybullet syntax to get current position of the end-effector/robot>
reward = 1/np.linalg.norm(goal_position - cur_position)
It would be really nice if you could propose something here. However, reward shaping is an entire field of research so a generic reward function for urdfenvs seems very ambitious.
Looking forward to hearing from you.
Best,@maxspahn
I think the best way to implement this feature is creating an abstract class (let's name it Reward
) that has an abstract method (let's call it getReward()
) that returns an integer. Every time we want to create a new reward function, we should create a class for it and override the getReward()
method. For each environment, alongside the list of robots
, we can pass the Reward
implementation. We can use data of sensors and cameras inside reward object.
In this scenario, user has the freedom of creating and experimenting different reward functions and algorithms. Reward shaping is a crucial step in RL that required lots of trial and error. What do you think?
IMHO, the only con that I see is the fact that we have to change the implementation of GenericRobot
and this can cause errors in production if we don't have unit tests.
I know object-oriented programming and design patterns due to my job (as a flutter developer) and haven't really tried to implement abstractions and interfaces inside python (but I know it's possible).
PS: When I was thinking about the solution above, I thought that everything we'll need for calculating reward function is available via sensors
and robots
lists. If we need some data inside step
method of environment, I don't think this solution is able to manage that.
hello? @maxspahn
Hi @behradkhadem ,
Sorry for me not being responsive on that. I am on vacation, will look into the OR next week.
Have you checked why the tests fail?
Best, @maxspahn
Ok then, enjoy your vacation. And tests failed because I've changed the implementation. 😅 But haven't checked it TBH and I was planning on doing that after solving the issue with sensors.
Hi everyone, hope you're all doing well.
As we know, the reward value inside
GenericRobot
class is statically set to 1. https://github.com/maxspahn/gym_envs_urdf/blob/be7532ae35675c5a2fd8c0d1782e8dbfd684e446/urdfenvs/urdf_common/urdf_env.py#L278 How can I define an environment with custom reward function (using data from sensors and stuff)? Upon cursory investigating the code I came across a class calledGoalComposition
from a different package, but I didn't see this class be used inside any example. Is this related to this package?And if not, is there any page or docs for future tasks and roadmap of this package in order to know what are prerequisites to implement reward function inside this package and help it develop? Are there any plans to implement reward function at all?