Clarity on optimum policy for simple Knapsack Model

RajarshiBhadra commented 3 years ago

I was going through the documentation provided here: https://www.datahubbs.com/action-masking-with-rllib/

I could understand how we can use action masking to constraint the optimization process. However I did not fully understand how are we getting the optimal policy from the trained model.

I understand that actions = trainer.compute_action(state) computes new actions but I am not clear on how to get the optimum policy. For example in the example numbers provided in the problem, the optimum policy is

Total value = 15 Total weight: 8 Packed items: [0, 2, 3, 4] Packed_weights: [1, 2, 1, 4]

How can we get this using the model

hubbs5 commented 3 years ago

The optimal policy for the RL model only comes via training, which wasn't shown in the masking post. Examples of how to train a model with Ray are given at the following links: https://www.datahubbs.com/how-to-use-deep-reinforcement-learning-to-improve-your-supply-chain/ https://www.datahubbs.com/ray-and-rllib-fast-reinforcement-learning/ https://www.datahubbs.com/hyperparameter-tuning-with-tune/

Hope that helps.

RajarshiBhadra commented 3 years ago

Thank you. This is very helpful. It answers my query

hubbs5 / or-gym

Clarity on optimum policy for simple Knapsack Model #13