ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
31.98k stars 5.45k forks source link

[Question] How to obtain the action embedding for avail_action ? #5540

Closed soloist96 closed 3 years ago

soloist96 commented 4 years ago

This is a general question about parametric action space but I think people here may know the solution. I try to embed the action but in the toy model given by rllib, the available action embedding are generated randomly in the env. Shouldn't be the embeddings learned? For example, for word2vec, it's learned from corpus. For a more practical RL problem, how could we obtain those embeddings? Thank you.

ericl commented 4 years ago

You can certainly try to learn it, that's up to you. To do that you can have a random embedding matrix instead of fixed values.

soloist96 commented 4 years ago

Thank you Eric for the reply. I am wondering if you could provide me with some instructions on a common way to learn the weights? In my understanding, is this equivalent with adding a fully connected layer (weights is embedding size x number of actions) at the end of my model (say my model outputs the dimension of embedding size) so that the output becomes the logits desired? In that case, it is equivalent of modifying network architecture and I don't see the meaning of introducing parametric actions.

I try to mimic how openAI embed the Dota action space but could not find any detailed explanations about how to learn the action embedding matrix.

Thank you!

ericl commented 4 years ago

You want to go from action (0..N index) to a fixed size embedding (size M) right? The typical way to do this is to multiply by a matrix of size (NxM), which can be learnable. This gives you your action embeddings.

Edit: note that the "multiply" is just a lookup in the embedding table (it's equivalent to a multiply if the action index is one-hot encoded). You can take a look at torch.nn.Embedding for an example of how it works.

On Mon, Aug 26, 2019, 2:21 PM soloist96 notifications@github.com wrote:

Thank you Eric for the reply. I am wondering if you could provide me with some instructions on a common way to learn the weights? In my understanding, is this equivalent with adding a fully connected layer (weights is embedding size x number of actions) at the end of my model (say my model outputs the dimension of embedding size) so that the output becomes the logits desired? In that case, it is equivalent of modifying network architecture and I don't see the meaning of introducing parametric actions.

I try to mimic how openAI embed the Dota action space but could not find any detailed explanations about how to learn the action embedding matrix.

Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/5540?email_source=notifications&email_token=AAADUSWOGQFSQCHFWZGI2O3QGRCPLA5CNFSM4IPRHSX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5FXC3Y#issuecomment-525037935, or mute the thread https://github.com/notifications/unsubscribe-auth/AAADUSTRSBQYVDXIDDROCQDQGRCPLANCNFSM4IPRHSXQ .

soloist96 commented 4 years ago

Thank you Eric. Yes, I want to go from an action to an embedding. And I know how the embedding table works. And my confuse is about how to use the RLlib framework to learn this embedding. For NLP problems, people can use Cbow or GloVe to train on corpus to get this embedding. However, in the RL framework, I am not clear how we get those embeddings for actions.

Are we expected to incorporate this embedding matrix in the policy neural network to get the weights learned? Thanks a lot.

ericl commented 4 years ago

Yes, it should be part of your model weights and is trained via backprop?

On Wed, Aug 28, 2019, 8:13 AM soloist96 notifications@github.com wrote:

Thank you Eric. Yes, I want to go from an action to an embedding. And I know how the embedding table works. And my confuse is about how to use the RLlib framework to learn this embedding. For NLP problems, people can use Cbow or GloVe to train on corpus to get this embedding. However, in the RL framework, I am not clear how we get those embeddings for actions.

Are we expected to incorporate this embedding matrix in the policy neural network to get the weights learned? Thanks a lot.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/5540?email_source=notifications&email_token=AAADUSSNRDUUEHGRHIVQHU3QG2IYVA5CNFSM4IPRHSX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LO3JI#issuecomment-525790629, or mute the thread https://github.com/notifications/unsubscribe-auth/AAADUSWMKYV4BWGANIVNYTLQG2IYVANCNFSM4IPRHSXQ .

soloist96 commented 4 years ago

Yeah, I agree! Thanks a lot!

nMack429 commented 4 years ago

@soloist96 Interesting discussion. Did you find an easy way of doing the suggested embedding training technique?

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 3 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!