Closed fxle closed 4 years ago
i updated. can you check this error is solved?
I have checked this update, but there are still problems.Do you know what the problem is? In addition, is your mailbox available(make8286@naver.com)? I suggest that we can use e-mail or other means to achieve timely communication.
[] pre_apn_epoch[13], || pre_apn_iter 19980 || pre_apn_loss: 0.0925 || Timer: 0.1519sec
[] Swtich optimize parameters to Class
Traceback (most recent call last):
File "/home/alex/Experiment/RACNN-pytorch/trainer.py", line 306, in
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f065e408c50>> Traceback (most recent call last): File "/home/alex/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 399, in del self._shutdown_workers() File "/home/alex/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers self.worker_result_queue.get() File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get return ForkingPickler.loads(res) File "/home/alex/.local/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd fd = df.detach() File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) ConnectionRefusedError: [Errno 111] Connection refused
Process finished with exit code 1
yes. mail address is valid, but i can't focus on issue timely. i am working on other project right now. This repo is just for my hobby
Well, thank you very much ,Looking forward to your update.
Have you solved this problem? I encountered the same problem.
I used vgg16 instead,but loss parameters do not converge.How about your experiment results now? @LXYTSOS
I got another problem, RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
Try to fix the batch-size to low,and keep your cuda memory enough
I fixed the batch-size to one, If I only use one GPU, I'll got this error when "Swtich optimize parameters to APN", and if I use multi-GPU, no matter how many GPUs I use, I'll got this error after one batch training. check this #7
@fxle i fixed the code and... sorry to tell this. For now, only cuda computation is supported to train. Not for CPU. To calculate loss, i have to know type of tensor but some case, there is no prediction so that there is no chance to check input type. If you use GPU, it will be fine.
I fix the batch_size=1,there are some following issues,Can you tell me how to solve them??
[] pre_apn_epoch[13], || pre_apn_iter 19980 || pre_apn_loss: 0.1223 || Timer: 0.1521sec [] Swtich optimize parameters to Class Traceback (most recent call last): File "/home/alex/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 3265, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in
runfile('/home/alex/Datas/code/RACNN-pytorch/trainer.py', wdir='/home/alex/Datas/code/RACNN-pytorch')
File "/usr/local/pycharm-2018.2.4/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/usr/local/pycharm-2018.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/alex/Datas/code/RACNN-pytorch/trainer.py", line 311, in
train()
File "/home/alex/Datas/code/RACNN-pytorch/trainer.py", line 136, in train
test(testloader, iteration)
File "/home/alex/Datas/code/RACNN-pytorch/trainer.py", line 292, in test
test_apn_losses = torch.stack(test_apn_losses).mean()
TypeError: expected Tensor as element 0 in argument 0, but got list