Closed deepblue0822 closed 6 years ago
can you down the batch size more smaller?
ok,I will try,thank you
I can't catch up How reproduce that error.
Normally, StopIteration error is occur with loader. If you change the dataloader, plz carefully check with it first.
I am not recommend to use pretrainAPN because it is not fully implemented. Just comment out pretrainAPN part and try again.
Please don't reply this message in the e-mail. Instead of mailing, plz use git issue dashboard to get support from others also.
Thanks
Jeongtae, Lee
-----Original Message----- From: "deepblue0822"notifications@github.com To: "jeong-tae/RACNN-pytorch"RACNN-pytorch@noreply.github.com; Cc: "jeong-tae"make8286@naver.com; "Comment"comment@noreply.github.com; Sent: 2018-06-29 (금) 12:32:33 Subject: Re: [jeong-tae/RACNN-pytorch] There is someyhing error ,and I changed logits,, = net(images) into logits, cc, aa= net(images) as it used to make error,but now it has the same error (#3)
when I down the batchsize to 2 ,and the error is Traceback (most recent call last): File "trainer2.py", line 310, in train() File "trainer2.py", line 61, in train apn_iter, apn_epoch, apn_steps = pretrainAPN(trainset, trainloader) File "trainer2.py", line 206, in pretrainAPN images, labels = next(batch_iterator) File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 276, in next raise StopIteration StopIteration — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
OK,Thank you very much.I tried down the batch_size and it worked but the accuracy rate is very low,and I think maybe my memory is not enough.And I will try to do something to improve it.Whatever thank you very much.
I hope you get good results. Even in my experiments, i can't load more than batch size 4. If you find any error or improvements, let me know and share it please.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "trainer2.py", line 305, in
train()
File "trainer2.py", line 94, in train
logits, cc, aa= net(images)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, kwargs)
File "/home/ubuntu/Desktop/ww/RACNN-pytorch-master/models/RACNN.py", line 55, in forward
conv5_4_A = self.b2.features:-1
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, *kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(input, kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/batchnorm.py", line 49, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f10b8d49dd8>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 349, in del
self._shutdown_workers()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused