matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.54k stars 11.69k forks source link

error when trying to load last trained model #907

Open bushra1100 opened 6 years ago

bushra1100 commented 6 years ago

i am having some issue due to which my PC turns off during training, and according to tutorial i ama trying to load the last trained model to continue the training but the following error occured.

error: Unrecognized arguments: --model=last

while running both of these commands (i am working on person data set)

python person.py train --dataset=/path/to/dataset --model=last --weights=coco python person.py train --dataset=/path/to/dataset --model=last

bilgan commented 6 years ago

try actual path of the file instead `last, maybe last model is broken try another one

bushra1100 commented 6 years ago

@bilgan by the actual path you mean the weight files or the checkpoints file? like in balloon.py case "path/to/balloon_maskrcnn_weights_epoch16.h5" files are saved in logs. should I try and load the last saved h5 file or the "balloon-checkpoint.ipynb" file from ".ipynb_checkpoints" folder?

bilgan commented 6 years ago

.h5 file, but not last one

bushra1100 commented 6 years ago

ok. so if I trained like 16 epochs than I have to start again from the .h5 file of 15th right?


From: Billy Ganzorig notifications@github.com Sent: Thursday, September 6, 2018 2:25:36 PM To: matterport/Mask_RCNN Cc: bushra1100; Author Subject: Re: [matterport/Mask_RCNN] error when trying to load last trained model (#907)

.h5 file, but not last one

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/matterport/Mask_RCNN/issues/907#issuecomment-419026958, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhUN0L4oSC9nrPXYS5YcBMkoSLeyvVuRks5uYOoQgaJpZM4WatdN.

bilgan commented 6 years ago

yes, thats right

bushra1100 commented 6 years ago

I tried to load second last and even third and fourth last weight files by this command:

python person.py train --dataset=E:/MASK/Mask_RCNN\samples\balloon\person_dataset\fused --weights=coco

--logs=E:\MASK\Mask_RCNN\logs\person20180906T2305\mask_rcnn_person_002.h5

but it's giving me this error:

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'E:\MASK\Mask_RCNN\logs\person20180906T2305\mask_rcnn_person_002.h5\person20180907T0403'


From: Billy Ganzorig notifications@github.com Sent: Thursday, September 6, 2018 2:52:59 PM To: matterport/Mask_RCNN Cc: bushra1100; Author Subject: Re: [matterport/Mask_RCNN] error when trying to load last trained model (#907)

yes, thats right

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/matterport/Mask_RCNN/issues/907#issuecomment-419035181, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhUN0DkAjMJQBldCcNjLhb7sf-iByWGxks5uYPB7gaJpZM4WatdN.

bushra1100 commented 6 years ago

i have changed my command and now it working but it started from very first epoch. why? and what should i do to start from where i left?

python person.py train --dataset=E:\MASK\Mask_RCNN\samples\balloon\person_dataset\fused --weights=E:\MASK\Mask_RCNN\logs\person20180906T2305\mask_rcnn_person_0006.h5

Loading weights E:\MASK\Mask_RCNN\logs\person20180906T2305\mask_rcnn_person_0006.h5 2018-09-10 01:36:11.781636: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 Training network heads

Starting at epoch 0. LR=0.001

Checkpoint Path: E:\MASK\Mask_RCNN\logs\person20180910T0136\mask_rcnnperson{epoch:04d}.h5 Selecting layers to train fpn_c5p5 (Conv2D) fpn_c4p4 (Conv2D) fpn_c3p3 (Conv2D) fpn_c2p2 (Conv2D) fpn_p5 (Conv2D) fpn_p2 (Conv2D) fpn_p3 (Conv2D) fpn_p4 (Conv2D) In model: rpn_model rpn_conv_shared (Conv2D) rpn_class_raw (Conv2D) rpn_bbox_pred (Conv2D) mrcnn_mask_conv1 (TimeDistributed) mrcnn_mask_bn1 (TimeDistributed) mrcnn_mask_conv2 (TimeDistributed) mrcnn_mask_bn2 (TimeDistributed) mrcnn_class_conv1 (TimeDistributed) mrcnn_class_bn1 (TimeDistributed) mrcnn_mask_conv3 (TimeDistributed) mrcnn_mask_bn3 (TimeDistributed) mrcnn_class_conv2 (TimeDistributed) mrcnn_class_bn2 (TimeDistributed) mrcnn_mask_conv4 (TimeDistributed) mrcnn_mask_bn4 (TimeDistributed) mrcnn_bbox_fc (TimeDistributed) mrcnn_mask_deconv (TimeDistributed) mrcnn_class_logits (TimeDistributed) mrcnn_mask (TimeDistributed) C:\Users\ETL\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " Epoch 1/30 C:\Users\ETL\AppData\Local\Programs\Python\Python35\lib\site-packages\skimage\transform_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images. warn("Anti-aliasing will be enabled by default in skimage 0.15 to " C:\Users\ETL\AppData\Local\Programs\Python\Python35\lib\site-packages\skimage\transform_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images. warn("Anti-aliasing will be enabled by default in skimage 0.15 to " C:\Users\ETL\AppData\Local\Programs\Python\Python35\lib\site-packages\skimage\transform_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images. warn("Anti-aliasing will be enabled by default in skimage 0.15 to " 1/10 [==>...........................] - ETA: 12:17 - loss: 0.4553 - rpn_class_loss: 0.0019 - rpn_bbox_loss: 0.0221 - mrcnn_class_loss: 0.0336 - mrcnn_bbox_loss: 0.1120 - mrcnn_mask_loss: 0.2858C:\Users\ETL\AppData\Local\Programs\Python\Python35\lib\site-packages\skimage\transform_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15

bushra1100 commented 6 years ago

this issue is solved: just load the .h5 file by giving the exact path to logs file as suggested by @bilgan and then go to model.py and reset the self.epoch to desired epoch or the epoch where your training discontinued.

for example: my training was disrupted ate epoch=16 I'll change my self.epoch in main.py to self.epoch=15 (15 because the last file might be corrupt) then I'll go to logs folder in my Mask_RCNN directory and copy the path to 15th epoch in command prompt like this:

python person.py train --dataset=E:\MASK\Mask_RCNN\samples\balloon\person_dataset\fused --weights=E:\MASK\Mask_RCNN\logs\person20180906T2305\mask_rcnn_person_0015.h5

waleedka commented 5 years ago
error: Unrecognized arguments: --model=last

@bushra1100 That error is odd. Could it be that you changed the code on your side in such a way that removes the "last" option? Try it on coco.py or balloon.py to isolate the problem.