millardjn / rusty_sr

Deep learning superresolution in pure rust
203 stars 21 forks source link

more details about train? #9

Open piaoger opened 6 years ago

piaoger commented 6 years ago

Nice to see your rust impl. I'd like to use it for 2X factor upscale. Can you provide more details about how to train the model? Or do you have any sample dataset or model for quick setup?

millardjn commented 6 years ago

Hi,

I used a few different datasets, but imagenet is the main one. To train you just need a reasonably varied set of images, even only 100 photos.

Unfortunately sometime recently rust-llvm got worse at slp auto-vectorization, so the currently released matrix multiplication is suddenly much slower. I've got a fix that relies on loop auto-vectorization instead but haven't released it.

I'll get back to you this weekend once I've fixed the performance issues.

piaoger commented 6 years ago

Thanks for your response.
When I testing model training for 2x factor, I provided "TRAINING_FOLDER"/"PARAMETER_FILE" params, and also changed the "FACTOR" into 2, it seems nothing happened on my macbook pro. Besides the performance issue, am I missed something?

millardjn commented 6 years ago

I've updated the dependancies, so you might want to rebase.

Changing FACTOR should be all that is required, although it means that the built-in res/*.rsr files have to be removed or replaced as they are only for FACTOR=3. This is what my training command looks like: cargo run --release -- train -r -v D:/ML/set14/original D:/test.rsr D:/ML/Imagenet

For performance the right env flags are required: RUSTFLAGS=-C target-cpu=native MATMULFLAGS=arch_sandybridge or if your cpu is new enough you can use arch_haswell

If you are on nightly you can also use prefetch and ftz_daz: MATMULFLAGS=arch_sandybridge, ftz_daz, prefetch see matrixmultiply_mt.

millardjn commented 6 years ago

Regarding nothing happening, that's worrying. I've had one other report of that happening on OSX: https://github.com/millardjn/rusty_sr/issues/3. Unfortunately I can't test on OSX.

Could you tell me what gets printed to stdout when running an upscale task and when running train?

Upscaling would normally print:Upscaling using imagenet neural net parameters... Writing file... Done Training would normally print:Loading paths for D:/ML/Imagenet ... and then training error for each batch.

When first running train it can take a while to start if there are a lot of files in the training_folder.

If you have time could you clone matrixmultiply_mt and run: cargo test and if you have nightly: cargo bench to check if it completes fine? It's possible there is a deadlock there.

This should help pinpoint whats happening, Thanks.

piaoger commented 6 years ago

I updated deps with "cargo update" and tried again. Nothing is still happened, CPU usage is closed to 0.0%. So I cloned matrixmultiply_mt and run "cargo test" and "cargo bench".

Also, I found the cpu usage is really low( 0.0% ) from mat_mul_f32::m0064 in my further benches.

piaoger commented 6 years ago

I also run "cargo test" and ""cargo bench" on my ubuntu machine, both of them are working fine. It seems that it's only a Mac OSX issue.

Blow is bechmark:

running 61 tests
test mat_mul_f32::m0004            ... bench:         667 ns/iter (+/- 61)
test mat_mul_f32::m0005            ... bench:       1,247 ns/iter (+/- 162)
test mat_mul_f32::m0006            ... bench:       1,246 ns/iter (+/- 111)
test mat_mul_f32::m0007            ... bench:       1,345 ns/iter (+/- 160)
test mat_mul_f32::m0008            ... bench:       1,463 ns/iter (+/- 193)
test mat_mul_f32::m0009            ... bench:       2,449 ns/iter (+/- 308)
test mat_mul_f32::m0012            ... bench:       2,662 ns/iter (+/- 356)
test mat_mul_f32::m0016            ... bench:       4,690 ns/iter (+/- 623)
test mat_mul_f32::m0032            ... bench:      22,270 ns/iter (+/- 2,925)
test mat_mul_f32::m0064            ... bench:     232,781 ns/iter (+/- 49,175)
test mat_mul_f32::m0127            ... bench:     467,294 ns/iter (+/- 113,538)
test mat_mul_f32::m0256            ... bench:   1,326,382 ns/iter (+/- 425,530)
test mat_mul_f32::m0512            ... bench:   5,773,158 ns/iter (+/- 1,125,111)
test mat_mul_f32::mix128x10000x128 ... bench:   5,171,025 ns/iter (+/- 363,278)
test mat_mul_f32::mix16x4          ... bench:       3,555 ns/iter (+/- 277)
test mat_mul_f32::mix32x2          ... bench:       3,522 ns/iter (+/- 273)
test mat_mul_f32::mix97            ... bench:     345,694 ns/iter (+/- 47,583)
test mat_mul_f32::skew1024x01      ... bench:     140,752 ns/iter (+/- 24,186)
test mat_mul_f32::skew1024x02      ... bench:     154,346 ns/iter (+/- 30,854)
test mat_mul_f32::skew1024x03      ... bench:     148,338 ns/iter (+/- 19,113)
test mat_mul_f32::skew1024x04      ... bench:     150,207 ns/iter (+/- 29,046)

I will go back to rusty-sr on linux:)

millardjn commented 6 years ago

Thank you very much for doing this testing, its good to know where the problem is!

I've updated matrixmultiply_mt to use the parking_lot crate which supplies alternative Condvar and Mutex. Apparently they are implemented differently and are more broadly compatible, hopefully it helps.

If it still doesn't work on OSX then turning off multithreading might be an option in the short term: MATMULFLAGS= ... , no_multithreading

I'll ask around and see if anyone knows if OSX has open issues. The only one I know of has been fixed https://github.com/jemalloc/jemalloc/issues/895.

piaoger commented 6 years ago

Where is the update about matrixmultiply_mt? The latest update in matrixmultiply_mt is 17hours old.

For rusty_sr itself, I still have question about how to use:

  1. For trying to train a new 2x factor, I downloaded test/train dataset from project SRCNN and its dataset for test. Does it make sense? And can you provide more information about PARAMETR argment, what it for? homepage: http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html VALIDATION FOLDER: SRCNN/Test/Test14 TRAINING_FOLDER: SRCNN/Training
  2. My commandline cargo run --release -- train -v ../SRCNN/Test/Test14 ./test.rsr ./SRCNN/Training
  3. Change FACTOR from 3 to 2.

I have started a new 2x factor training with above arguments a couple of hours again and it's running :) Please let me know if I am wrong so that I can start a new training with right arguments.

millardjn commented 6 years ago

Crates.io "Last Updated" is wrong, not sure why. Version 0.1.4 is the new one.

That training setup looks correct :).

The PARAMETER_FILE argument is where the weights learned by the neural network get saved (every 100 steps, when the validation PSNR gets printed). You can then use them when upscaling later using -c or --custom: ./rusty_sr --custom ./test.rsr ./input.png ./output.png

If you want your new parameters/weights to be used by default you'll have to put test.rsr in the /res folder and then change a few parts of the code. If you train on a few different datasets you can include them all in /res and reprogram how the -p -parameters arguments work on lines: