twitter-archive / torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Apache License 2.0
76 stars 24 forks source link

Calling Torch library within Processor generates Seg Fault #23

Closed JimmyWhitaker closed 8 years ago

JimmyWhitaker commented 8 years ago

When I call the signal library within a Processor function I get a Segmentation Fault. It seems that there is a memory conflict when making calls to a torch package that utilizes a C function.

As a quick example, here is a modification of CIFAR10 example provided:

...
-- Create a batched permutation sampler
local getTrainingBatch, numTrainingBatches = trainingDataset.sampledBatcher({
   samplerKind = 'permutation',
   cuda = opt.cuda,
   batchSize = opt.batchSize,
   inputDims = { 3, 32, 32 },
   verbose = true,
   processor = function(res, processorOpt, input)
      -- This function is not a closure, it is run in a clean Lua environment
      local image = require 'image'
      local signal = require 'signal'
      -- Turn the res string into a ByteTensor (containing the PNG file's contents)
      local bytes = torch.ByteTensor(#res)
      bytes:storage():string(res)
      -- Decompress the PNG bytes into a Tensor
      local pixels = image.decompressPNG(bytes)

      local flat = pixels:resize(3072)
      local spect = signal.spectrogram(flat, 10, 10)
      pixels:resize(3,32,32) -- Set everything back to normal

      -- Copy the pixels tensor into the mini-batch
      input:copy(pixels)
      return true
   end,
})
...

There seems to be a race condition involved in this error also, because sometimes I get the Segmentation Fault and other times I get:

*** Error in `/home/ubuntu/torch/install/bin/luajit': double free or corruption (fasttop): 0x00007f8994004200 ***

Any ideas on how to fix this?

JimmyWhitaker commented 8 years ago

I wrote my own multi-threaded dataloader, and it runs into the same issue. I think the issue has something to do with the underlying FFT library and not this package.

JimmyWhitaker commented 8 years ago

Just answering my own issue here. It looks like the problem is in the underlying library fftw3 of the signal package not being thread-safe. Here is the fftw3 issue. It is supposedly fixed in 3.3.5, but I haven't tested it yet. As it does not pertain to the torch-dataset package, this issue can be closed.

zakattacktwitter commented 8 years ago

Glad you figured it out!