Closed BrunoBelucci closed 1 year ago
Fix #189. Instantiate dropout layer in __init__ and keep a dropout_p (for probability) attribute that can be passed to F.scaled_dot_product_attention if using flash attention.
__init__
Fix #189. Instantiate dropout layer in
__init__
and keep a dropout_p (for probability) attribute that can be passed to F.scaled_dot_product_attention if using flash attention.