twitter-archive / torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Apache License 2.0
76 stars 24 forks source link

How can I get the FloatTensor values instead of the binary content when using IndexDirectory.lua ? #34

Open chienlinhuang1116 opened 8 years ago

chienlinhuang1116 commented 8 years ago

Hi,

I have a bunch of files on the local disk and just want to construct an Index/Dataset based on all the files that are present. IndexDirectory.lua supports this, but I need to modify Reader.lua in line:77, res[i] = torch.load(item.url) to make it works. The reason why did I modify line:77 is that the return value is the binary content instead of Torch FloatTensor using IndexDirectory.lua. How can I have the return value like IndexTensor.lua when using IndexDirectory.lua?

Thank you

zakattacktwitter commented 8 years ago

Use a processor function as an option to the sampledBatcher. That function should get the binary content of the file which you can can turn into a tensor via torch.MemoryFile.

Generally, you shouldn't really ever have to mess with the getter functions. The processor function is where you would do any custom work on the file data.

On Thursday, May 19, 2016, Chien-Lin Huang 黃建霖 notifications@github.com wrote:

Hi,

I have a bunch of files on the local disk and just want to construct an Index/Dataset based on all the files that are present. IndexDirectory.lua supports this, but I need to modify Reader.lua in line:77, res[i] = torch.load(item.url) to make it works. The reason why did I modify line:77 is that the return value is the binary content instead of Torch FloatTensor using IndexDirectory.lua. How can I have the return value like IndexTensor.lua when using IndexDirectory.lua?

Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/twitter/torch-dataset/issues/34

chienlinhuang1116 commented 8 years ago

Hi Zak,

You are right. I can use torch.deserialize to get FloatTensor values. However, it cannot control the inputDims when using IndexDirectory.lua. For example, I have a lot files with the size of {2000, 600} and would like to read it in {600} like

local getBatch, numBatches = dataset.sampledBatcher({
  samplerKind = 'linear',
  batchSize = 1,
  inputDims = {600},
  processor = function(res, opt, input)
      local x = torch.deserialize(res)
      input:copy(x)
      return true
  end,
})

But, I will get a whole FloatTensor {2000, 600} instead of the size of {600}. Do you have any idea about this?

Thank you, Chien-Lin