Open Xie-Muxi-BK opened 1 year ago
We recommend using English or English & Chinese for issues so that we could have broader discussion.
Not 100% sure if it could resolve your issue but it looks kind of similar to the issue I encountered. Based on the base config you used in ../_base_/datasets/potsdam.py
:
https://github.com/open-mmlab/mmsegmentation/blob/b600f7cb26829afa2c785af41755391626fbb446/configs/_base_/datasets/potsdam.py#L42-L52
Maybe you can try https://github.com/open-mmlab/mmsegmentation/issues/2777#issuecomment-1508144760 to see if there's any luck?
Not 100% sure if it could resolve your issue but it looks kind of similar to the issue I encountered. Based on the base config you used in
../_base_/datasets/potsdam.py
:Maybe you can try #2777 (comment) to see if there's any luck?
Thank you very much for your advice,It works properly after modification in this config but,when I use other config, It will have another exception ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
I used the simple PSPNet + Potsdam dataset to facilitate the expression of my problem
I've tried it before raise this issue
sampler=dict(type='InfiniteSampler', shuffle=True),
to
sampler=dict(type='DefaultSampler', shuffle=True),
But I didn't change parameters to the sampler it in the given config(PSPNet + Potsdam)
such as
_base_ = [
'../_base_/models/upernet_convnext.py', '../_base_/datasets/ade20k.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
]
crop_size = (512, 512)
data_preprocessor = dict(size=crop_size)
checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-tiny_3rdparty_32xb128-noema_in1k_20220301-795e9634.pth' # noqa
model = dict(
data_preprocessor=data_preprocessor,
backbone=dict(
type='mmcls.ConvNeXt',
arch='tiny',
out_indices=[0, 1, 2, 3],
drop_path_rate=0.4,
layer_scale_init_value=1.0,
gap_before_final_norm=False,
init_cfg=dict(
type='Pretrained', checkpoint=checkpoint_file,
prefix='backbone.')),
decode_head=dict(
in_channels=[96, 192, 384, 768],
num_classes=150,
),
auxiliary_head=dict(in_channels=384, num_classes=150),
test_cfg=dict(mode='slide', crop_size=crop_size, stride=(341, 341)),
)
optim_wrapper = dict(
_delete_=True,
type='AmpOptimWrapper',
optimizer=dict(
type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05),
paramwise_cfg={
'decay_rate': 0.9,
'decay_type': 'stage_wise',
'num_layers': 6
},
constructor='LearningRateDecayOptimizerConstructor',
loss_scale='dynamic')
param_scheduler = [
dict(
type='LinearLR', start_factor=1e-6, by_epoch=False, begin=0, end=1500),
dict(
type='PolyLR',
power=1.0,
begin=1500,
end=160000,
eta_min=0.0,
by_epoch=False,
)
]
# By default, models are trained on 8 GPUs with 2 images per GPU
train_dataloader = dict(batch_size=2,sampler=None)
val_dataloader = dict(batch_size=1,sampler=None)
test_dataloader = val_dataloader
train_cfg = dict(
_delete_ = True,
type='EpochBasedTrainLoop',
max_epochs=10,
val_interval=2)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
default_hooks = dict(
# timer=dict(type='EpochTimerHook'),
logger=dict(type='LoggerHook', interval=10, log_metric_by_epoch=True),
# param_scheduler=dict(type='ParamSchedulerHook',convert_to_iter_based=False),
checkpoint=dict(type='CheckpointHook', by_epoch=True, interval=1),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='SegVisualizationHook'))
log_processor = dict(by_epoch=True)
it will be raise
File "/root/anaconda3/envs/XMX/lib/python3.9/site-packages/torch/nn/functional.py", line 2416, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
i have the same problem ,have u solved that?
Hello, I had the same problem and I solved it by doing the following:
Adding train_dataloader = dict(sampler=dict(type='DefaultSampler', shuffle=True), drop_last=True )
in the configs
But I am still not sure why it makes the problem solved.
配置文件:
是什么原因?