losses = losses * self.args.rpo_alpha + policy_nll_loss TypeError: only integer tensors of a single element can be converted to an index #1924

image trl 0.9.6

