msracver / FCIS

Fully Convolutional Instance-aware Semantic Segmentation
MIT License
1.57k stars 415 forks source link

Error while training with coco dataset: h5py unable to open file #88

Open PardoAlejo opened 6 years ago

PardoAlejo commented 6 years ago

I'm trying to run: python experiments/fcis/fcis_end2end_train_test.py --cfg experiments/fcis/cfgs/resnet_v1_101_coco_fcis_end2end_ohem.yaml

It trains until batch [1000] and then I get the following error:

Epoch[0] Batch [1000] Speed: 2.92 samples/sec Train-RPNAcc=0.873100, RPNLogLoss=0.307023, RPNL1Loss=0.167374, FCISAcc=0.716729, FCISAccFG=0.000708, FCISLogLoss=2.082165, FCISL1Loss=0.089456, FCISMaskLoss=0.632843,
Exception in thread Thread-71: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.__kwargs) File "experiments/fcis/../../fcis/../lib/utils/PrefetchingIter.py", line 60, in prefetch_func self.next_batch[i] = self.iters[i].next() File "experiments/fcis/../../fcis/core/loader.py", line 99, in next self.get_batch_parallel() File "experiments/fcis/../../fcis/core/loader.py", line 161, in get_batch_parallel rst = self.parfetch(roidb) File "experiments/fcis/../../fcis/core/loader.py", line 183, in parfetch gt_masks = get_gt_masks(roidb[0]['cache_seg_inst'], data['im_info'][0,:2].astype('int')) File "experiments/fcis/../../fcis/../lib/mask/mask_transform.py", line 25, in get_gt_masks gt_masks = hkl.load(gt_mask_file) File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 616, in load h5f = file_opener(fileobj) File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 154, in file_opener h5f = h5.File(filename, mode) File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 272, in init__ fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 92, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642) File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/tmp/pip-4rPeHA-build/h5py/h5f.c:1930) IOError: Unable to open file (File signature not found)

Anyone can help me with this?

liyi14 commented 6 years ago

Maybe you can try printing the path before you read the file and see whether the one leads to this error have some problem. It seems that the program fails to read this file, which may because the file doesn't exist or you don't have the permission to get access to it. Add fixed random seed in TrainDataLoader may help you locate that file.

mldm4 commented 6 years ago

A solution is given in #11 but it did not work for me, all my images and hkl files (stored as cache) did have size >0. My solution was deleting the cache hkl files and launching the training again so they are created again hopefully without error.

wyx-2018 commented 5 years ago

"My solution was deleting the cache hkl files and launching the training again so they are created again hopefully without error",what did this mean?