Closed C0ldstudy closed 3 years ago
Hi @C0ldstudy,
Thank you so much for your helpful issue report, and sorry for my lazy response (I finally had a chance to solve this issue this weekend!).
Now the new branch devel_issue6
was released and the problems you've pointed out were solved in this branch. If you have a chance, please try it. I will merge it to main
branch within 2 weeks, but feel free to let me know if you faced another problem in the new branch.
The followings are the details of the errors and solutions.
Bug 1: The accuracy of the GPU-trained model was very bad
I found that the hyperparameters of the gradient descent training are not appropriate. Especially, for achieving better performance on MNIST, a larger learning rate and stronger regularization (weight decay) were necessary. I changed the default hyper parameter and have got a competitive result to CPU training (for a higher score, please specify longer training epochs). The following is a result on my environment (CPU: Intel i5-9300H, RAM: 16GB, GPU: GTX1660Ti).
Program starts: args = {'--C': 1.0,
'--cpus': -1,
'--gamma': 'auto',
'--help': False,
'--input': '../../dataset/mnist',
'--kdim': 1024,
'--kernel': 'rbf',
'--output': 'result.pickle',
'--pcadim': 128,
'--rtype': 'rff',
'--seed': 111,
'--stdev': 0.05,
'--use_fft': False,
'cpu': False,
'gpu': True,
'kernel': False}
Loading training data: 0.304882 [s]
Loading test data: 0.051620 [s]
Calculate PCA matrix: 0.607784 [s]
Epoch 0: Train loss = 2.0088e+00
Epoch 10: Train loss = 4.2311e-01
Epoch 20: Train loss = 3.5570e-01
Epoch 30: Train loss = 3.0776e-01
Epoch 40: Train loss = 2.9008e-01
Epoch 50: Train loss = 2.6589e-01
Epoch 60: Train loss = 2.4771e-01
Epoch 70: Train loss = 2.4421e-01
Epoch 80: Train loss = 2.2719e-01
Epoch 90: Train loss = 2.3216e-01
Epoch 100: Train loss = 2.1611e-01
Epoch 110: Train loss = 2.1149e-01
Epoch 120: Train loss = 2.0663e-01
Epoch 130: Train loss = 1.9935e-01
Epoch 140: Train loss = 1.9804e-01
Epoch 150: Train loss = 1.9009e-01
Epoch 160: Train loss = 1.8777e-01
Epoch 170: Train loss = 1.8264e-01
Epoch 180: Train loss = 1.8131e-01
Epoch 190: Train loss = 1.8022e-01
Epoch 200: Train loss = 1.7870e-01
Epoch 210: Train loss = 1.7725e-01
Epoch 220: Train loss = 1.7068e-01
Epoch 230: Train loss = 1.6717e-01
Epoch 240: Train loss = 1.6307e-01
Epoch 250: Train loss = 1.6820e-01
Epoch 260: Train loss = 1.5780e-01
Epoch 270: Train loss = 1.6357e-01
Epoch 280: Train loss = 1.6265e-01
Epoch 290: Train loss = 1.5841e-01
SVM learning: 222.397209 [s]
SVM prediction time for 1 image: 8.615327 [us]
Score = 97.16 [%]
Saving model: 0.051454 [s]
Bug 2: failed to validate a model which is trained on GPU
Simply, the validation script valid_rff_svc_for_mnist.py
did not assume a GPU-trained model (gpu
argument intended a GPU inference of CPU-trained model, but not mentioned clearly in the README). I've added a code for deal with a GPU-trained model, and I succeed to run a GPU-trained model. Now you can validate your model by the following command as mentioned in the README:
# Assume that your trained model is 'result.pickle'.
python3 valid_rff_svc_for_mnist.py gpu
The branch devel_issue6
was merged to the main
.
I run the
train_rff_svc_for_mnist.py
following the readme file. But as I try to use gpu to run the script the result is very bad. For example, the score is 16.10% and I cannot use the validation file by meeting the bug:TypeError: rfflearn.gpu.SVC: Only rfflearn.cpu.SVC supported.
While I try the other examples everything works smoothly.