Open FengLoveBella opened 6 years ago
seems issues about workers. set workers to 1 may fix it. (I'd suggest to change data input as image format instead of hdf5. multiple workers seems causing issues for hdf5 file)
@lijiaman Yes, I set workers to 1, it is ok now, but the total loss is extremely high, about 2000000, and there is no trend to decrease, it is normal?
@lijiaman I follow your code, and I encounter this bug, and I am not sure it is a bug of training dataset or a bug of training network. I am looking forward to your reply.
@zhoufengbuaa Hai, recently, I have to reproduce CASENET, when i run this repository, I also meet the problem(learning rate is much high), had you resolved it? Hope that you can help me. Looking forward to your reply.
@zhoufengbuaa Hai, recently, I have to reproduce CASENET, when i run this repository, I also meet the problem(learning rate is much high), had you resolved it? Hope that you can help me. Looking forward to your reply.
hi, have you been tested it? how does it perform?
When I am running your code, and I encounter the following error,
config:Namespace(batch_size=1, checkpoint_folder='./checkpoint', cls_num=20, epochs=150, lr=1e-07, lr_steps=[10000, 20000, 30000, 40000], momentum=0.9, multigpu=False, pretrained_model='', print_freq=1, resume_model='', start_epoch=0, weight_decay=0.0005, workers=16) ('train_dataset len', 42490) Totally new layer:score_edge_side1 Totally new layer:score_edge_side2 Totally new layer:score_edge_side3 Totally new layer:score_cls_side5 Totally new layer:ce_fusion label_name: label_name: Traceback (most recent call last): File "/home/fengzhou/CASENet/main.py", line 129, in
main()
File "/home/fengzhou/CASENet/main.py", line 84, in main
global_step = model_play.train(args, train_loader, model, optimizer, epoch, curr_lr, win_feats5, win_fusion, viz, global_step)
File "/home/fengzhou/CASENet/train_val/model_play.py", line 31, in train
for i, (img, target) in enumerate(train_loader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/fengzhou/CASENet/dataloader/SBD_data.py", line 61, in getitem\n np_data = self.h5_f[\'data/\'+labelname.replace(\'/\', \'\').replace(\'bin\', \'npy\')]\n File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)\n File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)\n File "/usr/lib/python2.7/dist-packages/h5py/_hl/group.py", line 166, in getitem\n oid = h5o.open(self.id, self._e(name), lapl=self._lapl)\n File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)\n File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)\n File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/h5o.c:3407)\nKeyError: \'Unable to open object (Bad object header version number)\'\n'
Did you encounter it before?? Thank you very much. @lijiaman