zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 451 forks source link

how can i train fcn on voc #98

Open oujieww opened 6 years ago

oujieww commented 6 years ago

i use pascal_aug as train voc12 as val, and batch size is None mean auto ,but i got RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 500 and 375 in dimension 2 at /storage2/oujie/DFN/PyTorch Encoding/pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1348

I think this is because image of voc have different size ,but i think this code use crop from image so should not have those kind of problem.

i also try batch=1 raise TypeError((error_msg.format(type(batch[0])))) TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'PIL.Image.Image'>

is there anyone know this ,please help me, thank you!!!

oujieww commented 6 years ago

sorry for it ,i just for got to use "self._sync_transform( _img, _target)"

but can u give me detail about train fcn on voc and vocaug, about hyperparameters of lr and more? is this can only use default hyperparameters as u setting in "/experiments/segmentation/option.py" ? only need to change model name to fcn?

oujieww commented 6 years ago

i train fcn-resnet50 the model in auther's nn, trainset is pascal_aug train.txt 8449 images and valid is seg11valid.txt this is same as orginal fcn used setting, and hyperparameters is ____, i use 2 gpus after epoch 31 my best MIoU is 56, is there any thing i miss? can anyone give me some help?

    # model and dataset 
    parser.add_argument('--model', type=str, default='fcn',
                        help='model name (default: encnet)')
    parser.add_argument('--backbone', type=str, default='resnet50',
                        help='backbone name (default: resnet50)')
    parser.add_argument('--dataset', type=str, default='ade20k',
                        help='dataset name (default: pascal12)')
    parser.add_argument('--trainset', type=str, default='pascal_aug',
                        help='model name (default: encnet)')
    parser.add_argument('--validset', type=str, default='pascal_voc',
                        help='model name (default: encnet)')
    parser.add_argument('--data-folder', type=str,
                        default=os.path.join(os.environ['HOME'], 'data'),
                        help='training dataset folder (default: \
                        $(HOME)/data)')
    parser.add_argument('--workers', type=int, default=4,
                        metavar='N', help='dataloader threads')
    parser.add_argument('--base-size', type=int, default=608,
                        help='base image size')
    parser.add_argument('--crop-size', type=int, default=576,
                        help='crop image size')
    # training hyper params
    parser.add_argument('--aux', action='store_true', default= False,
                        help='Auxilary Loss')
    parser.add_argument('--se-loss', action='store_true', default= False,
                        help='Semantic Encoding Loss SE-loss')
    parser.add_argument('--epochs', type=int, default=None, metavar='N',
                        help='number of epochs to train (default: auto)')
    parser.add_argument('--start_epoch', type=int, default=0,
                        metavar='N', help='start epochs (default:0)')
    parser.add_argument('--batch-size', type=int, default=None,
                        metavar='N', help='input batch size for \
                        training (default: auto)')
    parser.add_argument('--test-batch-size', type=int, default=None,
                        metavar='N', help='input batch size for \
                        testing (default: same as batch size)')
    # optimizer params
    parser.add_argument('--lr', type=float, default=None, metavar='LR',
                        help='learning rate (default: auto)')
    parser.add_argument('--lr-scheduler', type=str, default='poly',
                        help='learning rate scheduler (default: poly)')
    parser.add_argument('--momentum', type=float, default=0.9,
                        metavar='M', help='momentum (default: 0.9)')
    parser.add_argument('--weight-decay', type=float, default=1e-4,
                        metavar='M', help='w-decay (default: 1e-4)')
    # cuda, seed and logging
    parser.add_argument('--no-cuda', action='store_true', default=
                        False, help='disables CUDA training')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    # checking point
    parser.add_argument('--resume', type=str, default=None,
                        help='put the path to resuming file if needed')
    parser.add_argument('--checkname', type=str, default='default',
                        help='set the checkpoint name')
    parser.add_argument('--model-zoo', type=str, default=None,
                        help='evaluating on model zoo model')
    # finetuning pre-trained models
    parser.add_argument('--ft', action='store_true', default= False,
                        help='finetuning on a different dataset')
    parser.add_argument('--pre-class', type=int, default=None,
                        help='num of pre-trained classes \
                        (default: None)')
    # evaluation option
    parser.add_argument('--ema', action='store_true', default= False,
                        help='using EMA evaluation')
    parser.add_argument('--eval', action='store_true', default= False,
                        help='evaluating mIoU')
    parser.add_argument('--no-val', action='store_true', default= False,
                        help='skip validation during training')
    # test option
    parser.add_argument('--test-folder', type=str, default=None,
                        help='path to test image folder')
my462 commented 6 years ago

Why are you trying to train fcn network? The implementation of FCN network is very naive in this package and is used mainly for auxiliary loss if you read the code by yourself. It should have lower miou than the results from FCN paper.

oujieww commented 6 years ago

@my462 original FCN use VGG as basenet ,in this paper auther use res50-fcn as his baseline and i find in aother paper DFN they use res101-fcn4s as baseline ,in paper res basebone is better than the vgg fcn(FCN paper),i am looking for a why how to train a resnet-fcn ,i always can not get a bood result. so i need those, do u know how to train it?how to set those hyperparameters? and in some paper they said voc-augmneted set have 10582 1449 1456 for train val and test but i donwload the one only have 8449 for train ....if u know something about this ,can you talk me ?XD

my462 commented 6 years ago

voc-augmneted set can be downloaded from the link http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz

my462 commented 6 years ago

I didn't try the FCN model in this package so I have no idea if you can have a good result or not.

oujieww commented 6 years ago

@my462 thank you

oujieww commented 6 years ago

@my462 hi,bro,i download it,but this train.txt is only 8498 and val.txt is 2857 ,i still have no idea about why they have 10582 for train and 1449 for val

my462 commented 6 years ago

You are right. I use the val+train together so I didn't find the problem. And the total images is 11355 The description on UCB's website is "The SBD currently contains annotations from 11355 images taken from the PASCAL VOC 2011 dataset.These images were annotated on Amazon Mechanical Turk and the conflicts between the segmentations were resolved manually. For each image, we provide both category-level and instance-level segmentations and boundaries. The segmentations and boundaries provided are for the 20 object categories in the VOC 2011 challenge.“ So there should be another VOC augmentation dataset to use?

oujieww commented 6 years ago

yeah,i found those number of image in three papers, they use this aug-voc have 10582 for train ,maybe this detail is important ,T_T,hope someone know this

oujieww commented 6 years ago

an other problem is SBD train.txt include some image which in voc val.txt~~~~

my462 commented 6 years ago

https://github.com/JackieZhangdx/InstanceSegmentationList You can see the difference between different dataset on this link. Have a look!

oujieww commented 6 years ago

@my462 oh ,bro ,nice!!!XD