Closed CarbonPool closed 3 years ago
Isn't it a disk cache issue? For example, the first time is slow and the second time is fast. Do you have enough RAM to not use swap?
Isn't it a disk cache issue? For example, the first time is slow and the second time is fast. Do you have enough RAM to not use swap?
I reserved 28g of free memory space. If the "- resume models/cunet/art/noise3_model.t7" parameter is not used, the training speed only needs 4 minutes per round, otherwise it takes 30 minutes, and the GPU work efficiency is very low. I don't know if this is a problem in wsl2, I may need to actually go to a non-virtual environment to test it.
In addition, if I use the cunet model for training and the parameter is "-resume models/my_cunet/noise3_model.t7", the test works normally. The problem only occurred in the original "cunet/art/noise3_model.t7".
I got it. It is model loading issue in train.lua
.
All models in models
directory use cunn (torch's implementation) instead of cudnn for the convolution layer, for compatibility reasons. train.lua
uses the loaded model as it is, so cudnn is not used.
https://github.com/nagadomi/waifu2x/blob/44503fb4c013d4aa7fc1434a5ade2f5a7c85a263/train.lua#L529
replace with
model = w2nn.load_model(settings.resume, settings.backend == "cudnn", "ascii")
will probably fix it.
I got it. It is model loading issue in
train.lua
. All models inmodels
directory use cunn (torch's implementation) instead of cudnn for the convolution layer, for compatibility reasons.train.lua
uses the loaded model as it is, so cudnn is not used.https://github.com/nagadomi/waifu2x/blob/44503fb4c013d4aa7fc1434a5ade2f5a7c85a263/train.lua#L529
replace with
model = w2nn.load_model(settings.resume, settings.backend == "cudnn", "ascii")
will probably fix it.
Thanks, it worked for me.
I pushed the above changes to the master branch. I haven't tested it.
When I removed the "resume" parameter, the training speed returned to normal.