vl_imreadjpeg error under GPU model.

jstudy commented 8 years ago

When I run the alexnet training code with the lastest version and cudnn 5.0. The training speed suddenly drops from 260Hz to 60 Hz. I carefully checked the code and found that this was caused by the vl_imreadjpeg function. Because its speed dropped from almost 700Hz to 70 Hz. But I did not figure it out.

vedaldi commented 8 years ago

Hi, this may be a case of images being cached in RAM implicitly by the filing system. The first few images would then be loaded much faster.

We get good performance by moving images to an SSD, or even better in a ramdisk.

On 1 Jul 2016, at 01:49, Yili Zhao notifications@github.com wrote:

When I run the alexnet training code with the lastest version and cudnn 5.0. The training speed suddenly drops from 260Hz to 60 Hz. I carefully checked the code and found that this was caused by the vl_imreadjpeg function. Because its speed dropped from almost 700Hz to 70 Hz. But I did not figure it out.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vlfeat/matconvnet/issues/604, or mute the thread https://github.com/notifications/unsubscribe/AAE9nWJY-SxUNdFHBGCjIA-9AqiOuti3ks5qRI4pgaJpZM4JCy7B.

jstudy commented 8 years ago

Thanks for your reply, Vedaldi. So how fast does it run when the images are all on an SSD ? And how much space does it need when all the rescaled images (256*256) are loaded into a ramdisk?

vedaldi commented 8 years ago

Hi, after rescaling ImageNet is about 50GB (in Jpeg format). The speed depends a lot on your hardware, but with a good server we can read a 1-2K images per second from SSD.

On 5 Jul 2016, at 08:49, jstudy notifications@github.com wrote:

Thanks for your reply, Vedaldi. So how fast does it run when the images are all on an SSD ? And how much space does it need when all the rescaled images (256*256) are loaded into a ramdisk?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vlfeat/matconvnet/issues/604#issuecomment-230409686, or mute the thread https://github.com/notifications/unsubscribe/AAE9nT0-jnMgAcFOw9KjfCGIzSTaNOqaks5qSgyngaJpZM4JCy7B.

jstudy commented 8 years ago

Thanks, Vedaldi. I met a new problem when using preprocess-imagenet.sh. The error is : line10 : gm: command not found. It might be caused by ' done | ${gm} batch -echo on -feedback on -'. why? I also tried "convert_some_im", it is ok! But 'convert_some_gm' did not work.

jstudy commented 8 years ago

Hi, Vedaldi. As to the imagenet , Is the code able to evaluate 10 patches as presented by Krizhevsky in their paper? I found that there is one variable ‘’opts.numAugments‘ in cnn_imagenet_get_batch.m. But I have not figured this out.

vedaldi commented 8 years ago

Hi, this is not done by default, but would be trivial to add on top.

My suggestion, however, is to launch a network on a slightly larger image (i.e. without cropping it), exploiting the fact that in MCN everything is convolutional, and add an average pooling layer at the very end (or simply average the multiple predictions by hand). This would give you a much more efficient version of cropping and averaging.

On 6 Jul 2016, at 10:12, jstudy notifications@github.com wrote:

Hi, Vedaldi. As to the imagenet , Is the code able to evaluate 10 patches as presented by Krizhevsky in their paper? I found that there is one variable ‘’opts.numAugments‘ in cnn_imagenet_get_batch.m. But I have not figured this out.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vlfeat/matconvnet/issues/604#issuecomment-230719492, or mute the thread https://github.com/notifications/unsubscribe/AAE9nSXZO0MvecJAXN0KXyQDFMEiOxPdks5qS3FygaJpZM4JCy7B.

jstudy commented 8 years ago

Hi, Vedaldi I trained one alexnet using the newest matconvnet version20 , cudnn5, one Titan X, matlab2016a, windows. I pre-processed all the images using bi-linear interpolation to 256*256. Some settings are as follow, and I used the other parameters by default.
opts.modelType = 'alexnet' ; opts.networkType = 'simplenn' ; opts.batchNormalization = true ; opts.weightInitMethod = 'gaussian' ; opts.train.cudnn = true; opts.train.prefetch = true;
It totally run 20 epochs and got the validation top1err of 44.8% and top5err of 21.7%. The performance is lower than what you reported (matconvnet-alex, 2012, 41.8, 19.2 ). Could you give me some suggestion about how to improve my own performance.

vlfeat / matconvnet

vl_imreadjpeg error under GPU model. #604