Open zhoubin-me opened 10 months ago
notice you are using debug build. add '--release' to your command line and try again.
also give a small learning rate like 1e-4. the default value is 1e-3.
also give a small learning rate like 1e-4. the default value is 1e-3.
Thanks @howard0su , running in release mode does improves quite a lot:
Finished release [optimized] target(s) in 1.24s
Running `target/release/examples/mnist-training cnn`
train-images: [60000, 784]
train-labels: [60000]
test-images: [10000, 784]
test-labels: [10000]
1 train loss 0.48736 test acc: 95.24%, duration 3.841444833s
2 train loss 0.21030 test acc: 96.69%, duration 3.068839951s
3 train loss 0.18876 test acc: 96.20%, duration 3.133457776s
4 train loss 0.17360 test acc: 96.54%, duration 3.069877785s
5 train loss 0.16307 test acc: 96.64%, duration 3.127763016s
6 train loss 0.16272 test acc: 97.08%, duration 3.082288287s
7 train loss 0.15377 test acc: 96.73%, duration 3.131595679s
The test acc diff may due to data preprocessing. However, the GPU util is still too high compared to pytorch.
I was running example training
cargo run --example mnist-training --features="cuda" cnn
and measuring duration for one epoch, gotCompared to the script of pytorch with same model:
Got following:
Which means both accuracy and runtime is not decent compared pytorch. In addition, pytorch only uses 20-25% GPU util, but candle uses 80-90%.
Any clue for improvement?