wouterkool / attention-learn-to-route

Attention based model for learning to solve different routing problems
MIT License
1.04k stars 337 forks source link

Masking in SHA #52

Open shagharabbani opened 1 year ago

shagharabbani commented 1 year ago

Hi,

Would it be possible to apply masking only in the decoder single head attention? I think we have masking in both MHA and SHA in the decoder.

Best, Shaghayegh

wouterkool commented 1 year ago

Hi @shagharabbani, I think this would definitely be possible but is currently not implemented, also I'm not completely sure why you'd want that but feel free to try it!