peleiden / daluke

A Danish-speaking language model with entity-aware self-attention
MIT License
9 stars 0 forks source link

Investigate gradient locking/unlocking #136

Open asgerius opened 3 years ago

asgerius commented 3 years ago

Based on some of the new plots, I have a feeling that gradient locking/unlocking may not be working entirely as intended, both with regards to what parameters and also locking/unlocking them.