Open piaoger opened 7 years ago
Hi,
I used a few different datasets, but imagenet is the main one. To train you just need a reasonably varied set of images, even only 100 photos.
Unfortunately sometime recently rust-llvm got worse at slp auto-vectorization, so the currently released matrix multiplication is suddenly much slower. I've got a fix that relies on loop auto-vectorization instead but haven't released it.
I'll get back to you this weekend once I've fixed the performance issues.
Thanks for your response.
When I testing model training for 2x factor, I provided "TRAINING_FOLDER"/"PARAMETER_FILE" params, and also changed the "FACTOR" into 2, it seems nothing happened on my macbook pro.
Besides the performance issue, am I missed something?
I've updated the dependancies, so you might want to rebase.
Changing FACTOR should be all that is required, although it means that the built-in res/*.rsr files have to be removed or replaced as they are only for FACTOR=3. This is what my training command looks like:
cargo run --release -- train -r -v D:/ML/set14/original D:/test.rsr D:/ML/Imagenet
For performance the right env flags are required:
RUSTFLAGS=-C target-cpu=native
MATMULFLAGS=arch_sandybridge
or if your cpu is new enough you can use arch_haswell
If you are on nightly you can also use prefetch and ftz_daz:
MATMULFLAGS=arch_sandybridge, ftz_daz, prefetch
see matrixmultiply_mt.
Regarding nothing happening, that's worrying. I've had one other report of that happening on OSX: https://github.com/millardjn/rusty_sr/issues/3. Unfortunately I can't test on OSX.
Could you tell me what gets printed to stdout when running an upscale task and when running train?
Upscaling would normally print:Upscaling using imagenet neural net parameters... Writing file... Done
Training would normally print:Loading paths for D:/ML/Imagenet ...
and then training error for each batch.
When first running train
it can take a while to start if there are a lot of files in the training_folder.
If you have time could you clone matrixmultiply_mt and run:
cargo test
and if you have nightly:
cargo bench
to check if it completes fine? It's possible there is a deadlock there.
This should help pinpoint whats happening, Thanks.
I updated deps with "cargo update" and tried again. Nothing is still happened, CPU usage is closed to 0.0%. So I cloned matrixmultiply_mt and run "cargo test" and "cargo bench".
running 61 tests
test mat_mul_f32::m0004 ... bench: 584 ns/iter (+/- 139)
test mat_mul_f32::m0005 ... bench: 1,023 ns/iter (+/- 229)
test mat_mul_f32::m0006 ... bench: 1,048 ns/iter (+/- 224)
test mat_mul_f32::m0007 ... bench: 1,114 ns/iter (+/- 122)
test mat_mul_f32::m0008 ... bench: 1,133 ns/iter (+/- 539)
test mat_mul_f32::m0009 ... bench: 2,041 ns/iter (+/- 698)
test mat_mul_f32::m0012 ... bench: 2,273 ns/iter (+/- 827)
test mat_mul_f32::m0016 ... bench: 3,801 ns/iter (+/- 1,836)
test mat_mul_f32::m0032 ... bench: 18,220 ns/iter (+/- 8,100)
test mat_mul_f32::m0064 ... bench: 85,042 ns/iter (+/- 8,352)
test mat_mul_f32::m0127 ... bench: 234,191 ns/iter (+/- 17,730)
test mat_mul_f32::m0256 ... bench: 1,323,189 ns/iter (+/- 99,209)
test mat_mul_f32::m0512 ... bench: 10,057,554 ns/iter (+/- 4,495,104)
test mat_mul_f32::mix128x10000x128 ...
Also, I found the cpu usage is really low( 0.0% ) from mat_mul_f32::m0064 in my further benches.
I also run "cargo test" and ""cargo bench" on my ubuntu machine, both of them are working fine. It seems that it's only a Mac OSX issue.
Blow is bechmark:
running 61 tests
test mat_mul_f32::m0004 ... bench: 667 ns/iter (+/- 61)
test mat_mul_f32::m0005 ... bench: 1,247 ns/iter (+/- 162)
test mat_mul_f32::m0006 ... bench: 1,246 ns/iter (+/- 111)
test mat_mul_f32::m0007 ... bench: 1,345 ns/iter (+/- 160)
test mat_mul_f32::m0008 ... bench: 1,463 ns/iter (+/- 193)
test mat_mul_f32::m0009 ... bench: 2,449 ns/iter (+/- 308)
test mat_mul_f32::m0012 ... bench: 2,662 ns/iter (+/- 356)
test mat_mul_f32::m0016 ... bench: 4,690 ns/iter (+/- 623)
test mat_mul_f32::m0032 ... bench: 22,270 ns/iter (+/- 2,925)
test mat_mul_f32::m0064 ... bench: 232,781 ns/iter (+/- 49,175)
test mat_mul_f32::m0127 ... bench: 467,294 ns/iter (+/- 113,538)
test mat_mul_f32::m0256 ... bench: 1,326,382 ns/iter (+/- 425,530)
test mat_mul_f32::m0512 ... bench: 5,773,158 ns/iter (+/- 1,125,111)
test mat_mul_f32::mix128x10000x128 ... bench: 5,171,025 ns/iter (+/- 363,278)
test mat_mul_f32::mix16x4 ... bench: 3,555 ns/iter (+/- 277)
test mat_mul_f32::mix32x2 ... bench: 3,522 ns/iter (+/- 273)
test mat_mul_f32::mix97 ... bench: 345,694 ns/iter (+/- 47,583)
test mat_mul_f32::skew1024x01 ... bench: 140,752 ns/iter (+/- 24,186)
test mat_mul_f32::skew1024x02 ... bench: 154,346 ns/iter (+/- 30,854)
test mat_mul_f32::skew1024x03 ... bench: 148,338 ns/iter (+/- 19,113)
test mat_mul_f32::skew1024x04 ... bench: 150,207 ns/iter (+/- 29,046)
I will go back to rusty-sr on linux:)
Thank you very much for doing this testing, its good to know where the problem is!
I've updated matrixmultiply_mt
to use the parking_lot
crate which supplies alternative Condvar and Mutex. Apparently they are implemented differently and are more broadly compatible, hopefully it helps.
If it still doesn't work on OSX then turning off multithreading might be an option in the short term:
MATMULFLAGS= ... , no_multithreading
I'll ask around and see if anyone knows if OSX has open issues. The only one I know of has been fixed https://github.com/jemalloc/jemalloc/issues/895.
Where is the update about matrixmultiply_mt? The latest update in matrixmultiply_mt is 17hours old.
For rusty_sr itself, I still have question about how to use:
I have started a new 2x factor training with above arguments a couple of hours again and it's running :) Please let me know if I am wrong so that I can start a new training with right arguments.
Crates.io "Last Updated" is wrong, not sure why. Version 0.1.4 is the new one.
That training setup looks correct :).
The PARAMETER_FILE argument is where the weights learned by the neural network get saved (every 100 steps, when the validation PSNR gets printed). You can then use them when upscaling later using -c
or --custom
:
./rusty_sr --custom ./test.rsr ./input.png ./output.png
If you want your new parameters/weights to be used by default you'll have to put test.rsr in the /res folder and then change a few parts of the code. If you train on a few different datasets you can include them all in /res and reprogram how the -p
-parameters
arguments work on lines:
Nice to see your rust impl. I'd like to use it for 2X factor upscale. Can you provide more details about how to train the model? Or do you have any sample dataset or model for quick setup?