Closed soma2000-lang closed 1 year ago
Hi, if you take a look at XLMRobertaClassifier
, you'll find that we are reusing roberta_kernel_initializer
in XLMRoberta.
The reason is simple, XLMRoberta inherits from Roberta (try to look around in XLM Roberta Backbone). Furthermore, just to be clear, it's just a function that can be implemented seperately for XLMRoberta, and initializes kernel weight with random normal distribution.
@shivance Thanks for clarification.!
@shivance, thanks for helping out here! :)
@abheesht17 @mattdangerw I think this needs to be fixed