w1oves / Rein

[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>
https://zxwei.site/rein
GNU General Public License v3.0
215 stars 19 forks source link

自定义数据集 #46

Closed jxhoh closed 2 months ago

jxhoh commented 2 months ago

如果是想在自己的数据集里面做任务的话, cityscapes_type = "CityscapesDataset" 那这个City有点奇怪,我如果不想用这个dataset呢

jxhoh commented 2 months ago

我现在是想做二分类任务,但是我训练出来,验证会出现nan,很奇怪 +------------+-------+-------+ | Class | IoU | Acc | +------------+-------+-------+ | background | 100.0 | 100.0 | | house | nan | nan | +------------+-------+-------+

w1oves commented 2 months ago

注册新的dataset,请参阅readme中提到的mmsegmentation的文档,里面有提到

jxhoh commented 2 months ago

这个我 倒是搞定了

jxhoh commented 2 months ago

注册新的dataset,请参阅readme中提到的mmsegmentation的文档,里面有提到

但是,我准备用这个测试一个二分类问题的时候,出现了这个nan值,我那个reduce_zero_label,ignore_index 这两个值也设置了 然后还是nan很奇怪

jxhoh commented 2 months ago

作者大大,你 说二分类的时候,模型需要改动吗 `crop_size = (512, 512) num_classes = 2 model = dict( type="EncoderDecoder", data_preprocessor=dict( type="SegDataPreProcessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], size=crop_size, bgr_to_rgb=True, pad_val=0, seg_pad_val=255, ), backbone=dict( type="ReinsDinoVisionTransformer", reins_config=dict( type="LoRAReins", token_length=100, embed_dims=1024, num_layers=24, patch_size=16, link_token_to_query=True, lora_dim=16, ), patch_size=16, embed_dim=1024, depth=24, num_heads=16, mlp_ratio=4, img_size=512, ffn_layer="mlp", init_values=1e-05, block_chunks=0, qkv_bias=True, proj_bias=True, ffn_bias=True, init_cfg=dict( type="Pretrained", checkpoint=r'D:\code\ai\seg\Rein-train\Rein-train\checkpoints\dinov2_vitg14_reg4_pretrain.pth', ), ), decode_head=dict( type="ReinMask2FormerHead", replace_query_feat=True, in_channels=[1024, 1024, 1024, 1024], strides=[4, 8, 16, 32], feat_channels=256, out_channels=256, num_classes=num_classes, num_queries=100, num_transformer_feat_level=3, align_corners=False, pixel_decoder=dict( type="mmdet.MSDeformAttnPixelDecoder", num_outs=3, norm_cfg=dict(type="GN", num_groups=32), act_cfg=dict(type="ReLU"), encoder=dict( # DeformableDetrTransformerEncoder num_layers=6, layer_cfg=dict( # DeformableDetrTransformerEncoderLayer self_attn_cfg=dict( # MultiScaleDeformableAttention embed_dims=256, num_heads=8, num_levels=3, num_points=4, im2col_step=64, dropout=0.0, batch_first=True, norm_cfg=None, init_cfg=None, ), ffn_cfg=dict( embed_dims=256, feedforward_channels=1024, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="ReLU", inplace=True), ), ), init_cfg=None, ), positional_encoding=dict( # SinePositionalEncoding num_feats=128, normalize=True ), init_cfg=None, ), enforce_decoder_input_project=False, positional_encoding=dict( # SinePositionalEncoding num_feats=128, normalize=True ), transformer_decoder=dict( # Mask2FormerTransformerDecoder return_intermediate=True, num_layers=9, layer_cfg=dict( # Mask2FormerTransformerDecoderLayer self_attn_cfg=dict( # MultiheadAttention embed_dims=256, num_heads=8, attn_drop=0.0, proj_drop=0.0, dropout_layer=None, batch_first=True, ), cross_attn_cfg=dict( # MultiheadAttention embed_dims=256, num_heads=8, attn_drop=0.0, proj_drop=0.0, dropout_layer=None, batch_first=True, ), ffn_cfg=dict( embed_dims=256, feedforward_channels=2048, num_fcs=2, act_cfg=dict(type="ReLU", inplace=True), ffn_drop=0.0, dropout_layer=None, add_identity=True, ), ), init_cfg=None, ), loss_cls=dict( type="mmdet.CrossEntropyLoss", use_sigmoid=False, loss_weight=2.0, reduction="mean", class_weight=[1.0] * num_classes + [0.1], ), loss_mask=dict( type="mmdet.CrossEntropyLoss", use_sigmoid=True, reduction="mean", loss_weight=5.0, ), loss_dice=dict( type="mmdet.DiceLoss", use_sigmoid=True, activate=True, reduction="mean", naive_dice=True, eps=1.0, loss_weight=5.0, ), train_cfg=dict( num_points=12544, oversample_ratio=3.0, importance_sample_ratio=0.75, assigner=dict( type="mmdet.HungarianAssigner", match_costs=[ dict(type="mmdet.ClassificationCost", weight=2.0), dict( type="mmdet.CrossEntropyLossCost", weight=5.0, use_sigmoid=True ), dict(type="mmdet.DiceCost", weight=5.0, pred_act=True, eps=1.0), ], ), sampler=dict(type="mmdet.MaskPseudoSampler"), ), ),

model training and testing settings

train_cfg=dict(),
test_cfg=dict(
    mode="slide",
    crop_size=(512, 512),
    stride=(341, 341),
),

) `

jxhoh commented 2 months ago

这个我就修改了一下 num_class,其他地方我对mask2former 不是很熟悉

w1oves commented 2 months ago

模型应该是不需要改动的。可能还是你数据集的设置不对,建议仔细debug dataloader读入的数据标签是否正常。

jxhoh commented 2 months ago

模型应该是不需要改动的。可能还是你数据集的设置不对,建议仔细debug dataloader读入的数据标签是否正常。

我找到了,在iou的计算里面,那个代码,好像是numcalss 类别分,如果是20类,label 标签就只能设置为0-19,

cff6442975f21b141ec509bbbbcb7ec
w1oves commented 2 months ago

这里应该要看你dataset设置的对不对

jxhoh commented 2 months ago

这里应该要看你dataset设置的对不对

! python tools/generate_full_weights.py --dinov2_segmentor_path checkpoints/dinov2_segmentor.pth --backbone checkpoints/dinov2_vitl14_pretrain.pth --rein_head checkpoints/dinov2_rein_and_head.pth 大佬!我训练出来,我只有一个 iter啊,我知道骨干怎么填,但是这个 rein_head?怎么搞