twitter-archive / torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Apache License 2.0
76 stars 24 forks source link

How to read data using torch-dataset? #25

Closed chienlinhuang1116 closed 8 years ago

chienlinhuang1116 commented 8 years ago

Hi, I have 8 samples. There is an input matrix x[4][2] and a target vector y[4] in each sample.

local trainSet = {x = torch.FloatTensor(8,4,2), y = torch.FloatTensor(8,4)}
for s = 1, 8 do
    for b = 1, 4 do
        trainSet.y[s][b] = 0
        for d = 1, 2 do
            trainSet.x[s][b][d] = 1
        end
    end
end
torch.save('train.t7', trainSet)

The number of sample showed 32 instead of 8 when I used torch-dataset. Do you have any idea?

local trainingDataset = Dataset('train.t7', {partition = 1, partitions = 1})
local getTrainingBatch, numTrainingBatches = trainingDataset.sampledBatcher({
   samplerKind = 'linear',
   batchSize = 1,
   inputDims = {4,2},
   verbose = true,
   cuda = true,
   processor = function(res, processorOpt, input)
      input:copy(res)
      return true
   end,
})
print(numTrainingBatches())  -- it showed 32 instead of 8

Thank you very much.

zakattacktwitter commented 8 years ago

Hi,

You could fix the issue yourself. Look at IndexTensor.lua line 69. Its an apply call on the labels Tensor. The correct code would only traverse the outermost dimension of the labels Tensor.

Thanks, Zak

chienlinhuang1116 commented 8 years ago

Thank you Zak, the problem is resolved :)