Open hippietrail opened 1 year ago
The sigmoid functions always output a value between 0 and 1. I don't grok the whole system deeply enough to know if that has a direct bearing on what the range each nn.as[]
must be.
I think the activations will always be between 0 and 1 because the are set in nn_forward()
as the mean of the sigmoids. So since the max output of sigmoid is one if you have N inputs all at their max value you'll have N * 1 and to get the mean you divide by N, leaving 1.
This implies that your training data and your loss function will have to map their native range to the 0..1 range.
I guess it's the same in the backpropagation code.
It wouldn't be the case when the activation function is not sigmoid though. ReLU's range is 0 to infinity for instance, though that might depend on the implementation I suppose.
I think the answer is yes and the latest ML episode published to YouTube covering ReLU seems to cover this: I was wrong about Machine Learning! Here is what I learnt...
I started thinking activations were only valid between 0.0 and 1.0 But then I wasn't sure where I got that idea from and that it would disallow approximation functions like
tan()
for instance.I was playing with a
sin()
approximation and was seeing half of my graph flat at the bottom and thought I had a bug in my unfortunately messy code.But then I tried normalizing the values from -1 to +1 to 0 to 1 and it works much better.
I notice that all of the current code only deals with binary 0/1 or the pixel brightness from 0 to 1 and even in
adder.c
the numbers are represented bit by bit with each bit being a float between 0 and 1. This reinforces but doesn't confirm my theory.Is this a specified limitation or requirement of the code?