twitter-archive / torch-ipc

A set of primitives for parallel computation in Torch
Apache License 2.0
95 stars 28 forks source link

Is there any issue when we use the large data input? #23

Closed chienlinhuang1116 closed 8 years ago

chienlinhuang1116 commented 8 years ago

Hi, we saved our data to torch senor format 'train.t7' and loaded it using torch-dataset.

local trainingDataset = Dataset('train.t7', {partition = 1, partitions = 1})

If the size of 'train.t7' is 500GB or more, is there any limitation in the memory when using torch-dataset? Is there any issue when we define 'batchSize' or use 'torch-ipc'?

Thank you.

zakattacktwitter commented 8 years ago

There are a couple limits.

1) The size of your computer's memory. 2) The number of open file handles you can have. This is particularly troublesome on OSX where the limit is low. Torch's require system opens tons of files. If you spawn 100s of threads (via ipc.map or a large batch size) then you could run out of file handles.

chienlinhuang1116 commented 8 years ago

Thank you :)