twitter-archive / torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Apache License 2.0
76 stars 24 forks source link

The number of batchSize cannot be defined larger than 256 in “torch-dataset”. #24

Closed chienlinhuang1116 closed 8 years ago

chienlinhuang1116 commented 8 years ago

Hi, I apply "torch-distlearn" for the purpose of speech recognition. First, I convert and save inputs and labels into the torch file of "train.t7" by using

local vbTrainSet = { x = torch.Tensor(2593280, 680), y = torch.Tensor(2593280) }
vbTrainSet.x:fill(1)
vbTrainSet.y:fill(2)
torch.save('train.t7', vbTrainSet:float())

Then, I use “torch-dataset” to handle the input. The question is the number of batchSize is equal to the number of running threads or jobs. If I defined the “--batchSize 256”, the number of CPU usage is about 256 but there are only 32 CPU cores in my GPU machine. Is the number of batchSize related to the number of threads or jobs?

local trainingDataset = Dataset('train.t7', {partition = 1, partitions = 1 })
local getTrainingBatch, numTrainingBatches = trainingDataset.sampledBatcher({
   samplerKind = 'linear',
   batchSize = 256,
   inputDims = {680},
   verbose = true,
   cuda = true,
   processor = function(res, processorOpt, input)
      input:copy(res)
      return true
   end,
})

In addition, for some reasons, we cannot define the number of batchSize larger than 256 in “torch-dataset”. For example, There are errors when setting ”batchSize = 512”. Do you know the reason?

/home/chienh/torch/install/bin/luajit: /home/chienh/torch/install/share/lua/5.1/dataset/Reader.lua:52: ERROR: (/home/chienh/torch-ipc-master/src/map.c, 107): (11, Resource temporarily unavailable)
stack traceback:
        [C]: in function 'map'
        /home/chienh/torch/install/share/lua/5.1/dataset/Reader.lua:52: in function 'Reader'
        ...e/chienh/torch/install/share/lua/5.1/dataset/Dataset.lua:67: in function 'sampledBatcher'
        vbt.lua:77: in main chunk
        [C]: in function 'dofile'
        ...ienh/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00405800

Thank you.

zakattacktwitter commented 8 years ago

Hi,

Get the latest version of ( https://github.com/twitter/torch-ipc ) package and try setting poolSize to something small, like 128 or so.

Thanks, Zak

chienlinhuang1116 commented 8 years ago

Thank you Zak, it is resolved by setting "poolSize".