Open snapo opened 10 months ago
Hi thanks for the test. I didn't run tests but your conclusion about using large number of neurons seems correct. The reason is not related specifically to RLS but it's related to all single layered feed-forward networks. The advantage of this RLS method is its ability to converge and learn the weights fast with few iterations. The disadvantage here is that it's a single layered network. Therefore, it has all the issues that single layered networks has. Single layered networks require large number of parameters to model patterns that has large number of features. I explained why this is the case in my GPT video at the time mark: 11:15 here. Hope that helps: https://youtu.be/l-CjXFmcVzY?t=675
I will play around with it a little more, because i realy find it interesting and if one finds a way to have multiple layers it might be a very very very good and quick learner.... What might help you is that i think the brain has "like microprocessors" one or multiple clocks ticks (the spike you see when you meassure it)....
considering binary this would mean we would have to add somehow a binary clock to the input like 0,1,0,1,0,1 (for example a 128bit value something we never achive) per datapoint.
I saw this effect is helping if you try to predict a decreasing sinus soidal wave that has spikes at specific points. The example you have can predict everything that is even in the prediction, but it cant predict uneven things like every 3. value as it moves across time. if one adds the binary timer it is able to predict also the uneven numbers.
Again thanks for all your hard work....
First of all a lot of thanks for publishing your code. In your video you mention it is unknown how many layers / neurons should be sufficient. So i run some very basic tests on RLS. From my finding i estimate to get 100% accuracy you would require 1.1x the neurons of all possible outcomes.
To do the test i chose Majority bit as a test. where you have 3-12 input bits and the more bits are 1 the output is one, if more bits are 0 on the input , then the output should be 0 too. I run it for different neuron counts and different majority gates sizes all in a one-shot (so training data only seen once) like mentioned in the video and the github description.
Here is the adjusted code i used:
Here is the output with the accuracys:
Considering using it for a large language model (just for fun) which has unlimited meassureable outputs it would be impossible to use this network as the size would be in the 10^80 in size neurons. It would learn it perfectly all languages and code, but we would be unable to run it anywhere with current compute.
Am i doing something wrong, or is this the expected behaviour? Did you also run some tests on it?
Update: did run a test if epochs matter, and statistically they dont matter, 1 epoch is enough , but the neurons have to be at least 10% greater than the different possible outputs. Did run the test from 1-512 epochs...
Best regards