Closed ghost closed 5 years ago
The README clearly says that you need to pass in the correct config items that are used during training, which you seem to miss.
If you did not change any config in training, you should not load the model ImageNet-R50-GroupNorm32-AlignPadding.npz
at all because it needs a different set of configs.
Sorry for not elaborating on what my configuration is, I think it's best to just paste anything I changed here:
_C.MODE_MASK = False # FasterRCNN or MaskRCNN
_C.DATA.BASEDIR = ".../data/training_data/COCO"
_C.BACKBONE.WEIGHTS = ".../data/weights/ImageNet-R50-GroupNorm32-AlignPadding.npz"
Btw I used absolute paths but shortened them above.
So I'm pretty sure my config was not changed between training and prediction.
But I see what you are saying, is this (from the README):
MODE_FPN=True
FPN.NORM=GN
BACKBONE.NORM=GN
FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head
FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head
TRAIN.LR_SCHEDULE=[240000,320000,360000]
what you mean by needing a different set of configs? Minus the FPN stuff?
Since you load a GroupNorm backbone, at least you have to set BACKBONE.NORM=GN
. Loading weights from one model to a different model will usually produce garbage outputs.
Whether you want to change other configs is up to you. But at least this will give you a valid training setting.
You can also start with other backbones in the model zoo that does not use GroupNorm.
Thank you so much for your help.
Whether you want to change other configs is up to you.
Despite of this, if you're not very familiar with the models, it would be better to use one of the reasonable configs in the table instead of making up a new one.
When you pointed out the weights I was incorrectly using, I suddenly realized what "GN" meant, and the table also became very clear to me. Not sure if necessary for most, but it would be nice for newbies like me if that was mentioned in the README.
1. What you did:
(1) If you're using examples, what's the command you run:
python predict.py --predict ../data/training_data/COCO/train2014/COCO_train2014_000000000009.jpg --load ../data/tensorpack_logs/checkpoint
(2) If you're using examples, have you made any changes to the examples? Paste
git status; git diff
here:I used the FasterRCNN example as of 242dc71cafb9642e68a2bfb58bcf6ad45ccbb35c, only changing the directories.
2. What you observed:
Logs from GPU cluster I trained on
Logs from my laptop
(2) Other observations, if any:
I ran prediction on many images from the COCO training dataset but there are no results from the line:
in
predict.py
.I checked by making
viz.py
log a message if there was nothing in the prediction:I removed this bit of code for logs.
3. What you expected, if not obvious.
So I expected that running on the given pretrained models (
ImageNet-R50-GroupNorm32-AlignPadding.npz
in this case), would be able to do some prediction (even if bad) on the images it trained on for 24 hours. However, there seems to be no output whatsoever for any image I've tried on either computer.4. Your environment:
GPU cluster
My laptop:
Although I trained for a day, I did notice that the logs said ~7 days was expected for training to complete. Is that really what's required to get any sort of predictions at all? I just want to make sure the example is working.