Closed LUOBO123LUOBO123 closed 1 year ago
Better tune these hyperparameters, since your config is not consistent with that we provide.
Thank you for your reply. I train my custom datasets, so I change these hyperparameters. Now I turn the learning rate down.
Closed with being solved. If you have problems, feel free to reopen this issue.
Closed with being solved. If you have problems, feel free to reopen this issue.
OK
Branch
1.x branch (1.x version, such as
v1.0.0rc2
, ordev-1.x
branch)Prerequisite
Environment
consistent with the official
Describe the bug
Hellow,I change the input resolution to 416*416 when I train custom datasets. When the network is trained for 49 epochs, the print loss is nan.What could be the reason for this?
Reproduces the problem - code sample
No response
Reproduces the problem - command or script
No response
Reproduces the problem - error message
No response
Additional information
his is my parameters. I train the model with two 2080ti cards.
model = dict( type='CAE', backbone=dict( type='CAEViT', arch='b', patch_size=16, init_values=0.1, qkv_bias=False), neck=dict( type='CAENeck', patch_size=16, embed_dims=768, num_heads=12, regressor_depth=4, decoder_depth=4, mlp_ratio=4, init_values=0.1), head=dict( type='CAEHead', tokenizer_path='cae_ckpt/dalle_encoder.pth', lambd=2), base_momentum=0.0) data_source = 'ImageNet' dataset_type = 'SingleViewDataset' img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_pipeline = [ dict(type='RandomHorizontalFlip', p=0.5), dict( type='RandomResizedCropAndInterpolationWithTwoPic', size=416, second_size=208, interpolation='bicubic', second_interpolation='lanczos', scale=(0.08, 1.0)), dict(type='ToTensor'), dict( type='BEiTMaskGenerator', input_size=(26, 26), num_masking_patches=75, max_num_patches=None, min_num_patches=16) ] prefetch = False data = dict( samples_per_gpu=5, workers_per_gpu=8, train=dict( type='SingleViewDataset', data_source=dict( type='ImageNet', data_prefix='data_own/imagenet/train/n01440764/', ann_file='data_own/imagenet/meta/train.txt'), pipeline=[ dict(type='RandomHorizontalFlip', p=0.5), dict( type='RandomResizedCropAndInterpolationWithTwoPic', size=416, second_size=208, interpolation='bicubic', second_interpolation='lanczos', scale=(0.08, 1.0)), dict(type='ToTensor'), dict( type='BEiTMaskGenerator', input_size=(26, 26), num_masking_patches=75, max_num_patches=None, min_num_patches=16) ], prefetch=False)) optimizer = dict( type='AdamW', lr=0.0015, betas=(0.9, 0.999), weight_decay=0.05, paramwise_options=dict( norm=dict(weight_decay=0.0), bias=dict(weight_decay=0.0), gamma=dict(weight_decay=0.0))) optimizer_config = dict(grad_clip=dict(max_norm=3.0)) lr_config = dict( policy='StepFixCosineAnnealing', min_lr=1e-05, warmup='linear', warmup_iters=10, warmup_ratio=0.0001, warmup_by_epoch=True, by_epoch=False) runner = dict(type='EpochBasedRunner', max_epochs=300)