Open andrew-thought opened 2 years ago
Yes, these are related.
Is there any updates on this? I am using PPO and i am losing mind with this.
self.action_space = spaces.Dict({
'type': spaces.Discrete(self.discrete_features_count), # specific action type
'value1': spaces.Box(low=0.01, high=1.0, shape=(1,), dtype=np.float32),
'value2': spaces.Discrete(300)
})
def step(self, action):
try:
reward = 0
# Clip the action to be within the valid range
action['type'] = np.clip(action['type'], 0, self.action_space['type'].n - 1).astype(int)
action['value1'] = np.clip(action['value1'], 0.01, 1.0)
# Extract the parts of the action
normal_action = action['type']
self.value1 = action['value1'][0]
self.value2 = action['value2']
always getting this error at the end
raise e.with_traceback(filtered_tb) from None File "/home/will/miniconda3/envs/ray-tensor/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 7262, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__SparseSoftmaxCrossEntropyWithLogits_device_/job:localhost/replica:0/task:0/device:CPU:0}} Received a label value of 300 which is outside the valid range of [0, 300). Label values: 300 [Op:SparseSoftmaxCrossEntropyWithLogits]
It seems to be mixing the index with the discrete value? Please help!?!
Hello, are there any updates on this issue? I get the same error with PPO and cannot figure this out. In my case it's an action space gym.spaces.Discrete(8) and always the training stops due to this error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 8 which is outside the valid range of [0, 8). Label values: 8 [[{{node default_policy_wk1/SparseSoftmaxCrossEntropyWithLogits_18/SparseSoftmaxCrossEntropyWithLogits}}]]
What happened + What you expected to happen
I am working with FinRL-Meta (link in reproduction project). I wanted to try using RLlib’s ARS implementation with the same codebase, but the ARS model in both training and testing (using compute_single_action() for testing and not sure what method ARS is using to produce continuous actions for training, is it using compute_actions, or maybe a sampler function) is producing actions outside my environment’s defined action_space.
In this case, the environment defines action_space:
self.action_space = spaces.Box(low=-3, high=3, shape=(len(self.assets),)) # len(self.assets) always equals 1 currently
We sometimes see actions as far out-of-bounds as ± 60.
Additionally, ARS starts with actions around the 0.0 mark, but then increasingly grows to the bounds and then exceeds the bounds to where 99% for all actions are out of bounds as training progresses.
We expect ARS actions to respect the Box action space range.
What I have tried normalize actions = true and false, no change different exploration functions, disable exploration, no change pass clip_actions = true and false to compute_actions(), no change Upgrading Ray to 2.0.0, no change Tried PPO, PPO respects Box bounds We tried adding unsquash code to the ARS compute_single_action() but i suspect since ARS is probably not normalizing action space, this didn't appear to function properly
We see that ARS is overriding compute_single_actions() and probably doesn't contain normalize, unsquash or clip code to process those configuration flags. Since ARS doesn't override compute_actions() which does contain normalize, unsquash and clip code, ARS is probably not using that for training. We tried to debug that in training but we were not able to debug into the workers.
Sample of training output: Worker pid=176481) NEW STEP>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Worker pid=176481) x: 9.205153 (Worker pid=176481) action: 2 (Worker pid=176481) OOB (Worker pid=176481) NEW STEP>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Worker pid=176481) x: 6.871564 (Worker pid=176481) action: 2 (Worker pid=176481) OOB (Worker pid=176481) NEW STEP>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Worker pid=176481) x: 7.683341 (Worker pid=176481) action: 2 (Worker pid=176481) OOB (Worker pid=176481) NEW STEP>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Worker pid=176481) x: 8.150882 (Worker pid=176481) action: 2 (Worker pid=176481) OOB (Worker pid=176481) NEW STEP>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Worker pid=176481) x: 7.7060976 (Worker pid=176481) action: 2 (Worker pid=176481) OOB (Worker pid=176481) NEW STEP>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Worker pid=176481) x: 4.4323587 (Worker pid=176481) action: 2 (Worker pid=176481) OOB
Versions / Dependencies
Versions: ray = 1.12.0 gym = 0.21.0 python = 3.7
Reproduction script
I have a reproduction project (https://github.com/imnotpete/ARS-OOB-Reproduction), in a notebook. It’s based on my latest pull of FinRL-Meta, without any future changes.
Issue Severity
High: It blocks me from completing my task.