lfz / DSB2017

The solution of team 'grt123' in DSB2017
MIT License
1.23k stars 418 forks source link

关于detector的train #94

Open Carl-Lei opened 6 years ago

Carl-Lei commented 6 years ago

我在python3.6训练时出现错误 Traceback (most recent call last): File "main.py", line 349, in main() File "main.py", line 168, in main train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir) File "main.py", line 180, in train for i, (data, target, coord) in enumerate(data_loader): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 451, in iter return _DataLoaderIter(self) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 247, in init self._put_indices() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 295, in _put_indices indices = next(self.sample_iter, None) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 138, in iter for idx in self.sampler: File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 51, in iter return iter(torch.randperm(len(self.data_source)).tolist()) TypeError: 'float' object cannot be interpreted as an integer 不知道这里的‘float’指的时哪个变量?这种情况下,怎么改啊?

lfz commented 6 years ago

Dataparalle 包住model的情况下没法debug,你把dataparallel 去掉

On 13 Jul 2018, at 2:44 PM, Carl-Lei <notifications@github.com mailto:notifications@github.com> wrote:

我在python3.6训练时出现错误 Traceback (most recent call last): File "main.py", line 349, in main() File "main.py", line 168, in main train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir) File "main.py", line 180, in train for i, (data, target, coord) in enumerate(data_loader): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 451, in iter return _DataLoaderIter(self) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 247, in init self._put_indices() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 295, in _put_indices indices = next(self.sample_iter, None) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 138, in iter for idx in self.sampler: File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 51, in iter return iter(torch.randperm(len(self.data_source)).tolist()) TypeError: 'float' object cannot be interpreted as an integer 不知道这里的‘float’指的时哪个变量?这种情况下,怎么改啊?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lfz/DSB2017/issues/94, or mute the thread https://github.com/notifications/unsubscribe-auth/AIigQ7dalUmhcp1GheAE0xwWDuyfA0zbks5uGEHngaJpZM4VOYXr.

Carl-Lei commented 6 years ago

我把第99行的net = DataParallel(net)注释掉,还是不行啊,报同样的错误

Carl-Lei commented 6 years ago

@lfz

Carl-Lei commented 6 years ago

这个问题好像解决了。是因为在DataBowl3Detector的类里面 def len(self): if self.phase == 'train': return len(self.bboxes)/(1-self.r_rand) 这里是要return一个整数吗? @lfz

Carl-Lei commented 6 years ago

@lfz 这一句为什么会报错啊?input_size=(128,128,128), stride=4 这个取余的判断怎么会是False呢? Traceback (most recent call last): File "D:/mydsb/dsb_test/training/detector/main.py", line 349, in main() File "D:/mydsb/dsb_test/training/detector/main.py", line 168, in main train(train_loader, net, loss, epoch, optimizer, get_lr, args.save_freq, save_dir) File "D:/mydsb/dsb_test/training/detector/main.py", line 180, in train for i, (data, target, coord) in enumerate(data_loader): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 272, in next return self._process_next_batch(batch) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) AssertionError: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 57, in samples = collate_fn([dataset[i] for i in batch_indices]) File "D:\mydsb\dsb_test\training\detector\data.py", line 96, in getitem label = self.label_mapping(sample.shape[1:], target, bboxes) File "D:\mydsb\dsb_test\training\detector\data.py", line 273, in call assert(int(input_size[i])% stride == 0) AssertionError

shenlinyao commented 6 years ago
def len(self):
if self.phase == 'train':
return len(self.bboxes)/(1-self.r_rand)

解决了吗,我也有这个问题,我把/改成//,好像也不行

Carl-Lei commented 6 years ago

@shenlinyao 我是强行转换为int类型的 return int(len(self.bboxes)/(1-self.r_rand))

DaLei001 commented 5 years ago

@Carl-Lei 同样遇到了“assert(int(input_size[i])% stride == 0) AssertionError”的问题,请问你解决了这个问题了吗?

lihaossu commented 5 years ago

@DaLei001 请问 config里面的'luna_segment':'/work/DataBowl3/luna/seg-lungs-LUNA16/'的路径是什么?需要一些额外的数据吗?还是直接建立一个空的文件?

chenggangdu commented 4 years ago

@DaLei001 请问 config里面的'luna_segment':'/work/DataBowl3/luna/seg-lungs-LUNA16/'的路径是什么?需要一些额外的数据吗?还是直接建立一个空的文件?

这是LUNA的一个文件夹