2019-07-22 16:58:23,530 maskrcnn_benchmark.trainer INFO: eta: 14:28:19 iter: 240 loss: 3.1854 (3.6463) loss_densebox_cls: 0.8195 (1.3604) loss_densebox_reg: 1.7176 (1.6256) loss_reg_weights: 0.6483 (0.6603) time: 0.5786 (0.5804) data: 0.0209 (0.0257) lr: 0.006533 max mem: 6045
Traceback (most recent call last):
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/queues.py", line 241, in _feed
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 315, in reduce_storage
RuntimeError: unable to open shared memory object in read-write mode
Traceback (most recent call last):
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/resource_sharer.py", line 149, in _serve
send(conn, destination_pid)
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/resource_sharer.py", line 50, in send
reduction.send_handle(conn, new_fd, pid)
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 176, in send_handle
with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
File "/root/anaconda2/envs/mb/lib/python3.6/socket.py", line 460, in fromfd
nfd = dup(fd)
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "tools/train_net.py", line 183, in
main()
File "tools/train_net.py", line 176, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 82, in train
arguments,
File "/root/wangcheng/FCOS/maskrcnn_benchmark/engine/trainer.py", line 57, in dotrain
for iteration, (images, targets, ) in enumerate(data_loader, start_iter):
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 576, in next
idx, batch = self._get_batch()
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 553, in _get_batch
success, data = self._try_get_batch()
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
return recvfds(s, 1)[0]
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
raise EOFError
EOFError
I train it for 2classes.but after some iter ,the error happends
2019-07-22 16:58:23,530 maskrcnn_benchmark.trainer INFO: eta: 14:28:19 iter: 240 loss: 3.1854 (3.6463) loss_densebox_cls: 0.8195 (1.3604) loss_densebox_reg: 1.7176 (1.6256) loss_reg_weights: 0.6483 (0.6603) time: 0.5786 (0.5804) data: 0.0209 (0.0257) lr: 0.006533 max mem: 6045 Traceback (most recent call last): File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/queues.py", line 241, in _feed File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 315, in reduce_storage RuntimeError: unable to open shared memory object in read-write mode Traceback (most recent call last): File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/resource_sharer.py", line 149, in _serve send(conn, destination_pid) File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/resource_sharer.py", line 50, in send reduction.send_handle(conn, new_fd, pid) File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 176, in send_handle with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s: File "/root/anaconda2/envs/mb/lib/python3.6/socket.py", line 460, in fromfd nfd = dup(fd) OSError: [Errno 24] Too many open files Traceback (most recent call last): File "tools/train_net.py", line 183, in
main()
File "tools/train_net.py", line 176, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 82, in train
arguments,
File "/root/wangcheng/FCOS/maskrcnn_benchmark/engine/trainer.py", line 57, in dotrain
for iteration, (images, targets, ) in enumerate(data_loader, start_iter):
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 576, in next
idx, batch = self._get_batch()
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 553, in _get_batch
success, data = self._try_get_batch()
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/root/anaconda2/envs/mb/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
return recvfds(s, 1)[0]
File "/root/anaconda2/envs/mb/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
raise EOFError
EOFError
I train it for 2classes.but after some iter ,the error happends