SimpleKT dot-product attention vs multi-headed attention

pykt-team / pykt-toolkit

pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models

https://pykt.org

MIT License

194 stars 53 forks source link

SimpleKT dot-product attention vs multi-headed attention #188

Open GregSzopinski opened 1 month ago

GregSzopinski commented 1 month ago

Hi,

In SimpleKT research paper is written that the model uses ordinary dot-product attention, but in code in this repository I found that the implementation uses multi-headed attention. Do I get this correctly, that what is being used here is dot-product attention (that is attention module without trainable weights) run several (number of heads) times in parallel? Thank you for your help.