Open snapo opened 1 year ago
Thanks for the addition You can increase the performance further by passing the data through a non-linear layer of weights e.g Relu. I have added the following lines to the beginning of the code which improves the performance of method 1 to 0.91 on the testing set. The more nodes, the better performance (with 1000 nodes you can reach 95%) but the computational run becomes exponentially high due to the quadratic nature of "xs" de-correlation. Also the risk of over-fitting increases with more nodes
def activate(w, x):
linear = x = w.T @ x
# out = linear
out = np.maximum(linear, 0) #relu
return out
nodes = 500
w = np.random.randn(xs.shape[0], nodes)
xs = activate(w , xs)
X_train = activate(w , X_train.T).T
X_test = activate(w , X_test.T).T
Ps. Also you can increase the performance of Method 3 if you increase the embed_size
Very interesting behaviour...
When
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.94, random_state=42)
using only 6% to learn (with the one-hot encoded data) and 94% as test... we still get 81% accuracy... This is an absolut amazing result!
It seems Method 1 > Method 3 > Method 2
I run multiple different tests and all came up with the same result that Method 1 is the best method of those 3.
It is also (afaik) a record from all i remember on MNIST to get 81% accuracy with a single threaded cpu within 15 seconds!!! Never saw this before (in python , not c)...
Will probably also try it with CIFAR 10 and see how well it does...
Result: