ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.77k stars 5.75k forks source link

Creating custom metric on older version of ray [rllib] #8333

Closed ksettaluri6 closed 3 years ago

ksettaluri6 commented 4 years ago

I am trying to create a custom score metric, where I count any time the agent has reached a positive sparse reward of 10.

I modified the rllib/evaluation/episode.py with the following:

def curr_reward_for(self, agent_id=_DUMMY_AGENT_ID):
   """Returns the current reward for the specific agent."""
   history = self._agent_reward_history[agent_id]
   if len(history) >= 1:
       return history[-1]
   else:
       return 0.0

I then added these functions. The reached and unreached_count variables are a counter that I want to reset every time a training iteration finishes.

global reached_count 
global unreached_count 

reached_count = 0
unreached_count =0
def on_episode_start(info):
  episode = info["episode"]
  episode.user_data["rewards"] = []

def on_episode_step(info):
  episode = info["episode"]
  curr_rew = episode.curr_reward_for()
  episode.user_data["rewards"].append(curr_rew)

def on_episode_end(info):
  global reached_count
  global unreached_count
  episode = info["episode"]
  max_rew = np.max(np.array(episode.user_data["rewards"]))
  if max_rew == 10.0:
    reached_count +=1
  else:
    unreached_count +=1

  episode.user_data["reached"] = reached_count
  episode.user_data["unreached"] = unreached_count

I know want to tabulate this metric across all agents/episodes, and initially had:

def on_train_result(info):
  episode.custom_metrics["reached_spec"] = episode.user_data["reached"]/(episode.user_data["reached"] + episode.user_data["unreached"])

However, the on_train_result function does not have access to the episode class. Also, because I have an older version of Ray (0.6.3), I don't have the on_postprocess_traj function defined. I am unable to move to a newer version of Ray/Rllib due to issues with version control on the remote machine I am on.

My question is: where is the on_train_result function called, and is there a way I can modify it such that I get access to episode? Also because I am running this on 8 CPUs, I find that there are 8 different agents: how can I access all of their episode data during training?

richardliaw commented 4 years ago

Hm, 0.6.3 is essentially deprecated. This is a poor suggestion but does it make sense just to nuke the repo and cp a new set of files over for 0.8.5?

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 3 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!