Open danieleewww opened 8 years ago
Hi,
Could you give us a small code sample to reproduce the problem please?
From the error message the only thing we can see is that there is a memory allocation error while creating a new thread. Do you create a lot of threads? do you use lua 5.2 or luajit?
used luajit, main.lua code was "git clone https://github.com/qureai/ultrasound-nerve-segmentation-using-torchnet.git" and paste as the followings:
Main file --]]
require 'torch' require 'paths' require 'optim' require 'nn' require 'cunn' require 'cudnn' tnt = require 'torchnet'
torch.setnumthreads(1) -- speed up torch.setdefaulttensortype('torch.FloatTensor')
-- command line instructions reading local cmd = torch.CmdLine() cmd:text() cmd:text('Torch-7 context encoder training script') cmd:text() cmd:text('Options:') cmd:option('-dataset','./data/train.h5','Training dataset to be used') cmd:option('-model','./models/unet.lua','Path of the model to be used') cmd:option('-trainSize',100,'Size of the training dataset to be used, -1 if complete dataset has to be used') cmd:option('-valSize',25,'Size of the validation dataset to be used, -1 if complete validation dataset has to be used') cmd:option('-trainBatchSize',32,'Size of the batch to be used for training') cmd:option('-valBatchSize',32,'Size of the batch to be used for validation') cmd:option('-savePath','./data/saved_models/','Path to save models') cmd:option('-optimMethod','sgd','Algorithm to be used for learning - sgd | adam') cmd:option('-maxepoch',250,'Epochs for training') cmd:option('-cvParam',2,'Cross validation parameter used to segregate data based on patient number')
--- Main execution script function main(opt) opt.trainSize = opt.trainSize==-1 and nil or opt.trainSize opt.valSize = opt.valSize==-1 and nil or opt.valSize
-- loads the data loader require 'dataloader' local dl = DataLoader(opt) local trainDataset = dl:GetData('train',opt.trainSize) local valDataset = dl:GetData('val',opt.valSize) opt.trainDataset = trainDataset opt.valDataset = valDataset opt.dataset = paths.basename(opt.dataset,'.h5') print(opt)
require 'machine' local m = Machine(opt) m:train() end
local opt = cmd:parse(arg or {}) -- Table containing all the above options main(opt)
After googling around, your issue could be a problem with x64 OSX and luajit. Does the following code runs properly and prints "Hello from thread" ?
print("Loading threads")
local threads = require "threads"
print("Loading done")
print("Creating new thread with no init")
local pool = threads.Threads(
1)
print("Created")
print("Creating new thread with init")
local pool = threads.Threads(
1,
function()
print("Hello from thread")
end)
print("Created")
it could not print"Hello from thread" , the trackback : th> local pool = threads.Threads( ..> 1) [string "local pool = threads.Threads(..."]:1: attempt to index global 'threads' (a nil value) stack traceback: [string "local pool = threads.Threads(..."]:1: in main chunk [C]: in function 'xpcall' /Users/py3/torch/install/share/lua/5.1/trepl/init.lua:670: in function 'repl' .../py3/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk [C]: at 0x010062ad10
@danieleewww you need to write the code to a file and run it from the terminal using th test_threads.lua
OK, this was print out:
Daniel-iMac:torch py3$ th test_threads.lua
Loading threads
Loading done
Creating new thread with no init
Created
Creating new thread with init
Hello from thread
Created
So the thread creation seems to be working properly. I made your code running but I cannot reproduce your crash using a x64 Ubuntu. You will need someone with an OSX machine to investigate this further I guess.
trace: Daniel-iMac:ultrasound-nerve-segmentation-using-torchnet py3$ th main.lua Setting up data loader using ./data/train.h5
Data loader setup done! { savePath : "./data/saved_models/" valSize : 25 dataset : "train" valDataset : { dataset : { load : function: 0x14948c38 list : LongTensor - size: 1198 } replacement : false perm : LongTensor - size: 25 size : 25 sampler : function: 0x148e5630 } optimMethod : "sgd" valBatchSize : 32 cvParam : 2 maxepoch : 250 trainSize : 100 trainDataset : { dataset : { load : function: 0x14c11890 list : LongTensor - size: 4437 } replacement : false perm : LongTensor - size: 100 size : 100 sampler : function: 0x10a43f08 } trainBatchSize : 32 model : "./models/unet.lua" } THREAD FATAL ERROR: could not create lua state
this was on OSX10.11.6
any suggestion on this!