Closed cpwan closed 2 years ago
DI-engine doesn't set strict rules for model, but there are some conventions between model and policy. For example, if you use PPO policy, you must define actor
and critic
in your model and implement some functions like compute_actor
and compute_critic
. I think the simplest way to customize your own model is to imitate and modify the default model of a policy, like vac for PPO.
As for your case, why do you want to use pointer network, in DI-star, we only use it to output selected_units
and maintain hidden state
inside the model. And which RL policy do you want to use? Please provide more information.
Thanks for your reply. Let me check that out.
As per the motivation, I want to use the pointer network to solve the Traveling salesman problem (which has a dynamic number of inputs). I have seen works that train the pointer network with the classical REINFORCE algorithm. I would like to do experiments with other more advanced RL policies.
OK I get your point. And I think it is important to model a proper MDP problem if you want to use any RL algorithms. I noticed the original paper of pointer network for TSP is a kind of supervised learning, and different MDP modeling types will make essential contribution to your final implementation and performance.
For example, if you input a state and want to output the entire path per step, you can put the hidden state implementation inside of your network model. But if you want to output the following location per step, you need to modify policy to maintain hidden state, like the differences between R2D2 and DQN.
Hi there, I am new to DI-engine. I am trying to implement the pointer network for my own environment. The most relevant resource I can find is the docs about the RNN here. It seems that I can treat the pointer network as a kind of RNN and wrap each decoding output as
hidden_state
. But the encoder (also an LSTM) output is also used in every decoding step. Can I wrap it as anotherhidden_state
? I noticed from slack that a similar architecture had been implemented in DI-star. Can you give me directions on how to make it work? Also, I am not sure which part of the codes I should modify. It will be good if you can point me to the docs/ tutorial on customizing models.