Open mfouesneau opened 2 years ago
At a glance, library usage seems good to me! Perhaps one way to figure this out is to establish a baseline using some other method (kernel, neural network, etc), to figure out what loss values are expected? For example it seems that y
will have a mean of 20 (expectation of the absolute value of a uniform from -44 to 36 is 1/2 (40 + 4) / 2 + 1/2 (40 - 4) / 2), so scale of outputs is pretty large, so it's not obvious to me if the loss values are that large. Another angle is to try increasing the training set size - it's hard to say if 600 training points is large enough for the model to learn well.
args = np.argsort(x_test[:, 6])
y_mean = np.reshape(y_test_ntk.mean, (-1,))[args]
y_std = np.sqrt(np.diag(y_test_ntk.covariance))[args]
plt.plot(X[:,6],y,'k.', alpha=0.1, rasterized=True)
plt.fill_between(
np.reshape(x_test[args, 6], (-1)),
y_mean - 3 * y_std,
y_mean + 3 * y_std,
color='red', alpha=0.2)
plt.xlabel('x_6')
plt.ylabel('y')
The thing is that changing the layer from 50 nodes to 5000 hardly changes the output. I would expect at least some changes.
I tried 10_000 points, and I only gained a factor of 2 on the loss
Is there any guidance on what a correct training set should be?
I get memory errors if I try 100 000 points in my dataset. Even with the batch trick
kernel_fn = nt.batch(kernel_fn,
device_count=0,
batch_size=1_000)
Note that in your example you are doing inference with an infinitely-wide neural network (kernel_fn
), so the width doesn't matter in this case. Also, the plot does look like the learned function mimicks |x_6 - 4|
(at least it's not doing something obviously wrong, it has the right shape and kink location), so I'm inclined to think that it's working as intended?...
Re training set, I think it's constructed correctly, I'm just not sure how to reason about the generalization that we should expect from it (per your plot, it seems to be at least OKish?...).
And yes, 100K is too much for most GPUs.
You're right; it seems to be doing ok, but with serious overfit.
Is there a paper to read to get a feeling for appropriate network architecture? My understanding is that multiplying layers will not change anything unless a "layer" is a complex thing already. right?
Dear team, great package, I'm very excited to use it.
However, I tried a simple case, and I failed miserably to get a decent performance.
I generate a multi-dimensional dataset with a relatively simple feature
And I followed your examples as
Visual inspection shows terrible predictions, and loss values are large:
I varied the network in many ways and fiddled with
learning_rate
anddiag_reg
, but I hardly changed anything.I'm sure I am doing something wrong, but I cannot see what it is. Any obvious mistake?
Thanks for your help.