When I run train.py on coco dataset with resnet-101 model, it 's struck on for a long time.

lji72 commented 5 years ago

/home/liuji/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:97: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " INFO:tensorflow:Restoring parameters from /home/liuji/light_head_rcnn/data/imagenet_weights/res101.ckpt

^CTraceback (most recent call last): File "train.py", line 264, in train(args) File "train.py", line 186, in train blobs_list = prefetch_data_layer.forward() File "/home/liuji/light_head_rcnn/lib/utils/dpflow/prefetching_iter.py", line 78, in forward if self.iter_next(): File "/home/liuji/light_head_rcnn/lib/utils/dpflow/prefetching_iter.py", line 65, in iter_next e.wait() File "/home/liuji/anaconda3/envs/tensorflow/lib/python3.6/threading.py", line 551, in wait signaled = self._cond.wait(timeout) File "/home/liuji/anaconda3/envs/tensorflow/lib/python3.6/threading.py", line 295, in wait waiter.acquire()

Hello, I meet the problem, could you give a more detail solution. Thanks

XingLiuJia commented 5 years ago

hello,can you send me COCO ?thank you Email:1195333426@qq.com

mbruchalski1 commented 5 years ago

Having the same problem, dataset, json, odgt all look good. Able to run the test code, but unable to know what this problem is. Error message is not detailed. Does anyone have a solution for this issue or the code is only for evaluation and does not work for training?

masotrix commented 5 years ago

I solved it adjusting "nr_dataflow" in config.py (in the corresponding folder you should be training according to README.md) from 16 to 2 in case of 1 GPU, because train_batch_per_gpu=2, (so 8GPUs x 2 images = 16 and 1GPU x 2 image = 2). Hope this helps you ✌️

zengarden / light_head_rcnn

When I run train.py on coco dataset with resnet-101 model, it 's struck on for a long time. #59