Closed riyajatar37003 closed 2 months ago
Hi,
We reuse the dropouts implemented by Huggingface's transformers, which are applied to attention blocks and hidden states of each transformer layer. See modeling_bert.py and modeling_roberta.py from transformers' source code for details.
Thanks got it.
Hi,
where exactly dropout is being applied ? can anyone point to code/file.
thanks