Great thanks for sharing your code!
It is not clear for me why do you scale the reward in the following way:
if reward['y_pred_auc'][i][j][k] > 12: reward['y_pred_auc'][i][j][k] = 12/ 10000.0 else: reward['y_pred_auc'][i][j][k] = reward['y_pred_auc'][i][j][k] / 10000.0
Great thanks for sharing your code! It is not clear for me why do you scale the reward in the following way:
if reward['y_pred_auc'][i][j][k] > 12: reward['y_pred_auc'][i][j][k] = 12/ 10000.0 else: reward['y_pred_auc'][i][j][k] = reward['y_pred_auc'][i][j][k] / 10000.0
Could you please help?