vlfeat / matconvnet

MatConvNet: CNNs for MATLAB
Other
1.4k stars 753 forks source link

How to speed up while reading data from disk? #387

Closed layumi closed 8 years ago

layumi commented 8 years ago

The main problem might be that my data is 2-channel data(.flo) rather than jpeg and it is too big(1.5TB) to put it in memory. So I write the data path in the imdb.images and in getBatch function read data from disk. But reading data is quite slow. Is there any advice to accelerate this processing? I have try writing mex file/adding par for/adding openmp but these methods don't work well. My code is as follow to get a new batch.(Actually I need to read 1280 files if batchsize is 128. Because I use 10 frames to train.)

function [im, labels] = getBatch(imdb, batch) % -------------------------------------------------------------------- imlist = imdb.images.data(:,batch) ; % get selected video-clip dir (I save path not data) batch_size = numel(batch); im = ones(224,224,2,10,batch_size);%initial %----------get selected video random 10 frame---- for i=1:batch_size p = imlist{i}; file = dir(p); %count file %-----------random select start clip (because I want continuous 10 clip) s = size(file,1)-9; rr = randperm(s); selected = rr(1);%the start clip Q = 2; while(selected == 1 || selected ==2) selected = rr(Q); % get rid of "." && ".." Q = Q+1; end imv = zeros(240,320,2,10); %get 10 frame by given path. imv(:,:,:,1) = imresize(readFlowFile_fast(strcat(p,file(selected).name)),[240,320]); imv(:,:,:,2) = imresize(readFlowFile_fast(strcat(p,file(selected+1).name)),[240,320]); imv(:,:,:,3) = imresize(readFlowFile_fast(strcat(p,file(selected+2).name)),[240,320]); imv(:,:,:,4) = imresize(readFlowFile_fast(strcat(p,file(selected+3).name)),[240,320]); imv(:,:,:,5) = imresize(readFlowFile_fast(strcat(p,file(selected+4).name)),[240,320]); imv(:,:,:,6) = imresize(readFlowFile_fast(strcat(p,file(selected+5).name)),[240,320]); imv(:,:,:,7) = imresize(readFlowFile_fast(strcat(p,file(selected+6).name)),[240,320]); imv(:,:,:,8) = imresize(readFlowFile_fast(strcat(p,file(selected+7).name)),[240,320]); imv(:,:,:,9) = imresize(readFlowFile_fast(strcat(p,file(selected+8).name)),[240,320]); imv(:,:,:,10) = imresize(readFlowFile_fast(strcat(p,file(selected+9).name)),[240,320]); im(:,:,:,:,i) = random_cut(imv); end im = single(reshape(im,224,224,20,numel(batch))); labels = imdb.images.label(1,batch) ;

bazilas commented 8 years ago

Hi,

You could use prefetching and also pre-process (e.g. offline resize) - store your data in a ramdisk for faster loading.

Check also Karel's comments in #336.

Vasilis

layumi commented 8 years ago

Thank you @bazilas . 1.I have moved imresize in preprocessing. And I transform my data from flo(float32-2channel) to jpg(uint8-3channel) and using imreadjpg with multi-threads. But it still do not work and cost 10+ s when getting batch. 2.So I google something about ramdisk(It's a dir /dev/shm under Ubuntu). I think it's a very good idea for most data!! However,my pc memory is 128G. It can not hold 1.5TB data. I must swap data. Considering the swap cost, I don't try this method. And another disadvantage may be preprocess data in binary format. 3.Finally,I decide to buy a 2TB ssd by my scholarship...I hope it can accelerate the get batch function.