rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167
Other
34.28k stars 4.2k forks source link

Dropout - activated value #428

Closed d-kleine closed 4 weeks ago

d-kleine commented 4 weeks ago

I have noticed a small detail missing on p. 78 in section 3.5.2 in the book explaining the concept of dropout. There is no errors in the text or code, but due to the chosen dropout = 0.5, I find it hard for readers to understand the concept of scaling the remaining elements in the matrix after the dropout, especially when the dropout is not 0.5, for example 0.3. The provided explanation for the calculation of the activated value is specific to the 1s in the matrix and the dropout = 0.5.

My suggestion how to improve the explanation would be to provide the formula for how to calculate each activated value in the matrix after the dropout. The formula to calculate the values of the activated remaining neurons after applying dropout with probability $$p$$ is:

$$ \text{activated value} = \text{original value} \times \frac{1}{1 - p} $$

Where:

rasbt commented 4 weeks ago

Thanks for the suggestion! When I understand correctly, you mean replacing / extending

To compensate for the reduction in active elements, the values of the remaining elements in the matrix are scaled up by a factor of 1/0.5 = 2.

by the actual formula? Sure that's a good idea. I'll add that to my notes in case there will be a 2nd edition in a few years. Thanks!

d-kleine commented 4 weeks ago

Yes, exactly, to make the calculation of the active values more generally understandable