Dropout - activated value

d-kleine commented 4 weeks ago

I have noticed a small detail missing on p. 78 in section 3.5.2 in the book explaining the concept of dropout. There is no errors in the text or code, but due to the chosen dropout = 0.5, I find it hard for readers to understand the concept of scaling the remaining elements in the matrix after the dropout, especially when the dropout is not 0.5, for example 0.3. The provided explanation for the calculation of the activated value is specific to the 1s in the matrix and the dropout = 0.5.

My suggestion how to improve the explanation would be to provide the formula for how to calculate each activated value in the matrix after the dropout. The formula to calculate the values of the activated remaining neurons after applying dropout with probability $$p$$ is:

$$ \text{activated value} = \text{original value} \times \frac{1}{1 - p} $$

Where:

$$\text{original value}$$ is the value of the neuron before dropout,
$$p$$ is the dropout probability (the fraction of neurons that are inactive),
$$1 - p$$ is the fraction of neurons that remain active, and
$$\frac{1}{1 - p}$$ is the scaling factor applied to the remaining neurons.

rasbt commented 4 weeks ago

Thanks for the suggestion! When I understand correctly, you mean replacing / extending

To compensate for the reduction in active elements, the values of the remaining elements in the matrix are scaled up by a factor of 1/0.5 = 2.

by the actual formula? Sure that's a good idea. I'll add that to my notes in case there will be a 2nd edition in a few years. Thanks!

d-kleine commented 4 weeks ago

Yes, exactly, to make the calculation of the active values more generally understandable

rasbt / LLMs-from-scratch

Dropout - activated value #428