train_rff_svc_for_mnist.py does not support GPU

C0ldstudy commented 3 years ago

I run the train_rff_svc_for_mnist.py following the readme file. But as I try to use gpu to run the script the result is very bad. For example, the score is 16.10% and I cannot use the validation file by meeting the bug: TypeError: rfflearn.gpu.SVC: Only rfflearn.cpu.SVC supported. While I try the other examples everything works smoothly.

tiskw commented 3 years ago

Hi @C0ldstudy,

Thank you so much for your helpful issue report, and sorry for my lazy response (I finally had a chance to solve this issue this weekend!).

Now the new branch devel_issue6 was released and the problems you've pointed out were solved in this branch. If you have a chance, please try it. I will merge it to main branch within 2 weeks, but feel free to let me know if you faced another problem in the new branch.

The followings are the details of the errors and solutions.

Bug 1: The accuracy of the GPU-trained model was very bad

I found that the hyperparameters of the gradient descent training are not appropriate. Especially, for achieving better performance on MNIST, a larger learning rate and stronger regularization (weight decay) were necessary. I changed the default hyper parameter and have got a competitive result to CPU training (for a higher score, please specify longer training epochs). The following is a result on my environment (CPU: Intel i5-9300H, RAM: 16GB, GPU: GTX1660Ti).

Program starts: args = {'--C': 1.0,
 '--cpus': -1,
 '--gamma': 'auto',
 '--help': False,
 '--input': '../../dataset/mnist',
 '--kdim': 1024,
 '--kernel': 'rbf',
 '--output': 'result.pickle',
 '--pcadim': 128,
 '--rtype': 'rff',
 '--seed': 111,
 '--stdev': 0.05,
 '--use_fft': False,
 'cpu': False,
 'gpu': True,
 'kernel': False}
Loading training data: 0.304882 [s]
Loading test data: 0.051620 [s]
Calculate PCA matrix: 0.607784 [s]
Epoch    0: Train loss = 2.0088e+00
Epoch   10: Train loss = 4.2311e-01
Epoch   20: Train loss = 3.5570e-01
Epoch   30: Train loss = 3.0776e-01
Epoch   40: Train loss = 2.9008e-01
Epoch   50: Train loss = 2.6589e-01
Epoch   60: Train loss = 2.4771e-01
Epoch   70: Train loss = 2.4421e-01
Epoch   80: Train loss = 2.2719e-01
Epoch   90: Train loss = 2.3216e-01
Epoch  100: Train loss = 2.1611e-01
Epoch  110: Train loss = 2.1149e-01
Epoch  120: Train loss = 2.0663e-01
Epoch  130: Train loss = 1.9935e-01
Epoch  140: Train loss = 1.9804e-01
Epoch  150: Train loss = 1.9009e-01
Epoch  160: Train loss = 1.8777e-01
Epoch  170: Train loss = 1.8264e-01
Epoch  180: Train loss = 1.8131e-01
Epoch  190: Train loss = 1.8022e-01
Epoch  200: Train loss = 1.7870e-01
Epoch  210: Train loss = 1.7725e-01
Epoch  220: Train loss = 1.7068e-01
Epoch  230: Train loss = 1.6717e-01
Epoch  240: Train loss = 1.6307e-01
Epoch  250: Train loss = 1.6820e-01
Epoch  260: Train loss = 1.5780e-01
Epoch  270: Train loss = 1.6357e-01
Epoch  280: Train loss = 1.6265e-01
Epoch  290: Train loss = 1.5841e-01
SVM learning: 222.397209 [s]
SVM prediction time for 1 image: 8.615327 [us]
Score = 97.16 [%]
Saving model: 0.051454 [s]

Bug 2: failed to validate a model which is trained on GPU

Simply, the validation script valid_rff_svc_for_mnist.py did not assume a GPU-trained model (gpu argument intended a GPU inference of CPU-trained model, but not mentioned clearly in the README). I've added a code for deal with a GPU-trained model, and I succeed to run a GPU-trained model. Now you can validate your model by the following command as mentioned in the README:

# Assume that your trained model is 'result.pickle'.
python3 valid_rff_svc_for_mnist.py gpu

tiskw commented 3 years ago

The branch devel_issue6 was merged to the main.

tiskw / random-fourier-features

train_rff_svc_for_mnist.py does not support GPU #6