Closed abebe9849 closed 3 years ago
Hi,
Did you ensure, the x passed on the attention layer [i.e. x passed to attn1(x)] does not have any NaNs?
I took care not to include nan in the input and it was executed without any problem. Thank you very much.
I applied this great model to regression, but the value is nan in the model.transformer part.
Did this happen during implementation? If anyone has used it for their own data, please let me know.
these are hyper params