Imitation loss schedule - access number of steps taken in custom_loss [rllib]

Carlz182 commented 4 years ago

I was following the example script for a custom loss function. I am interested in bootstrapping my policy with a dataset from an algorithmic supervisor. However, the demonstrations are not perfect so at some point I would like the influence of the imitation loss to decrease to zero and let the policy loss take over. In the example the influence of the imitation loss is hard coded with 10 but I would like to make it change over time.

    def custom_loss(self, policy_loss, loss_inputs):
        # create a new input reader per worker
        reader = JsonReader(self.options["custom_options"]["input_files"])
        input_ops = reader.tf_input_ops()

        # define a secondary loss by building a graph copy with weight sharing
        obs = tf.cast(input_ops["obs"], tf.float32)
        logits, _ = self._build_layers_v2({
            "obs": restore_original_dimensions(obs, self.obs_space)
        }, self.num_outputs, self.options)

        # You can also add self-supervised losses easily by referencing tensors
        # created during _build_layers_v2(). For example, an autoencoder-style
        # loss can be added as follows:
        # ae_loss = squared_diff(
        #     loss_inputs["obs"], Decoder(self.fcnet.last_layer))
        print("FYI: You can also use these tensors: {}, ".format(loss_inputs))

        # compute the IL loss
        action_dist = Categorical(logits, self.options)
        self.policy_loss = policy_loss
        self.imitation_loss = tf.reduce_mean(
            -action_dist.logp(input_ops["actions"]))
        return policy_loss + 10 * self.imitation_loss

Unfortunately I did not find a way to change the model's parameters from outside during training. What would be the suggested way to do something like this? I was thinking about doing this in the train_result callback but I did not succeed.

An alternative would be a behavioral cloning function similar to the pre_train function in baselines but I did not find any reference if that is already implemented somewhere.

I am using PPO for training.

Rllib version: 0.8.2, Python 3.7.6, Ubuntu 18.04

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 3 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

ray-project / ray

Imitation loss schedule - access number of steps taken in custom_loss [rllib] #8362