SKIP hyperparameter edits

See response email coming shortly. Basically, there were a couple of issues leading to the problems you saw:

The biggest speed issue was that you were using a root decomposition size of 100 by not setting it for training (this is the number of Lanczos iterations in the paper). This is extremely large. I changed it to 30, which matches our example notebook for SKIP (https://github.com/cornellius-gp/gpytorch/blob/master/examples/05_Scalable_GP_Regression_Multidimensional/Scalable_Kernel_Interpolation_for_Products_CUDA.ipynb) and what we typically used in the paper. You were already using a size of 30 for prediction, so I assume just missed adding it for training as well.
- For reference, the reason 100 is the default value is because we use the same root decomposition elsewhere (like for sampling), where we explicitly want to run to very fine convergence, not just get a kernel approximation.
You had num_dims=18 hard coded in to the kernel, but it needs to match the data dimension size.
In general, because you don't learn the inducing point locations with SKIP, we recommend much higher learning rates and fewer training iterations. The experiments in the paper were run for at most 30 iterations at a learning rate of 0.1.

I also included a comment about how to initialize hyperparameter values.

hughsalimbeni / bayesian_benchmarks