Closed pm3310 closed 3 years ago
I used policy = tf.saved_model.load('my_awesome_policy')
instead and it worked
Assigned to bandits team. They can close unless they have a comment.
Hey @bartokg , I have 2 questions:
policy = tf.saved_model.load('my_awesome_policy')
worked but policy = policy_loader.load('my_awesome_policy')
didn't work?Hi Pavlos, Happy to see you found a working loader function! I never used policy_loader and it's hard to see why it fails. From the documentation of policy_saver.PolicySaver (https://github.com/tensorflow/agents/blob/master/tf_agents/policies/policy_saver.py), it recommends using
saved_policy = tf.compat.v2.saved_model.load('policy_0')
policy_state = saved_policy.get_initial_state(batch_size=3)
as you also suggest. I assume the compat.v2 can be omitted now.
thank you @bartokg Any suggestion for putting a bandit agent in production for continuous learning?
In general what you want is a trainer that consumes data (with the train()
function of the agent), you save the model periodically, then another binary can periodically load the latest model (policy) and call action()
. This is all doable by hand.
If you want a fully productionized solution, you can try TensorFlow Extended, that integrates well with TF-Agents bandits.
Hi team,
Here's the code that trains and saves a Bandit policy
And here is the code that loads the previously trained Policy
However, the line
policy.action(time_step=time_step_obj, policy_state=policy.get_initial_state(2))
generates the following error:My goal is to have a Bandit in a RESTful endpoint to sample from and train in an online fashion. Do you have any best practices on how to deploy Bandits as a RESTful service?