huawei-noah / vega

AutoML tools chain
http://www.noahlab.com.hk/opensource/vega/
Other
848 stars 176 forks source link

sp-nas针对目标检测进行模型结构搜索,在加载数据集时,能够加载自己的数据集吗? #277

Open gyr-kdgc opened 1 year ago

gyr-kdgc commented 1 year ago

目前看文档好像只有coco、imagenet等官方数据集,数据集太大了,现在没法进行测试,自己的数据集也没法加载

gyr-kdgc commented 1 year ago

只要把自己的数据集转换为coco格式的数据集就能够使用了,这个问题已经解决了,但是运行的时候,在reignition阶段会报错:model statics failed, ex=conv2d(): argument 'input' (position 1) must be Tensor, not list;yml文件内容如下: `pipeline: [fine_tune, serial, reignition, parallel, fullytrain]

fine_tune: pipe_step: type: TrainPipeStep

model:
    pretrained_model_file: /home/nas/pretrain/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
    model_desc:
        type: FasterRCNN
        convert_pretrained: True
        backbone:
            type: SerialBackbone

trainer:
    type: Trainer
    epochs: 1
    with_train: False
    optimizer:
        type: SGD
        params:
            lr: 0.02
            momentum: 0.9
            weight_decay: !!float 1e-4
    lr_scheduler:
        type: WarmupScheduler
        by_epoch: False
        params:
            warmup_type: linear
            warmup_iters: 1000
            warmup_ratio: 0.001
            after_scheduler_config:
                type: MultiStepLR
                by_epoch: True
                params:
                    milestones: [ 10, 20 ]
                    gamma: 0.1
    loss:
        type: SumLoss
    metric:
        type: coco
        params:
            anno_path: /home/nas/data/COCO2017/annotations/instances_val2017.json

dataset:
    type: CocoDataset
    common:
        data_root: /home/nas/data/COCO2017
        batch_size: 4
        img_prefix: "2017"
        ann_prefix: instances

serial: pipe_step: type: SearchPipeStep

search_algorithm:
    type: SpNasS
    max_sample: 1

search_space:
    type: SearchSpace
    hyperparameters:
        -   key: network.backbone.code
            type: CATEGORY
            range: ['111-2111-211111-211']

model:
    pretrained_model_file: "{local_base_path}/output/fine_tune/model_0.pth"
    model_desc:
        type: FasterRCNN
        freeze_swap_keys: True
        backbone:
            type: SerialBackbone
trainer:
    ref: fine_tune.trainer
    epochs: 1

dataset:
    ref: fine_tune.dataset

reignition: pipe_step: type: TrainPipeStep models_folder: "{local_base_path}/output/serial/"

# dataset:
#     type: Imagenet
#     common:
#         data_path: /cache/datasets/ILSVRC/Data/CLS-LOC
#         batch_size: 128
dataset:
    type: CocoDataset
    common:
        data_root: /home/nas/data/COCO2017
        batch_size: 4
        img_prefix: "2017"
        ann_prefix: instances

trainer:
    type: Trainer
    epochs: 1
    callbacks: ReignitionCallback
    mixup: True
    optimizer:
        type: SGD
        params:
            lr: 0.1
            momentum: 0.9
            weight_decay: !!float 1e-4
    lr_scheduler:
        type: CosineAnnealingLR
        by_epoch: True
        params:
            T_max: 20
    loss:
        type: CrossEntropyLoss

parallel: pipe_step: type: SearchPipeStep models_folder: "{local_base_path}/output/reignition/"

search_algorithm:
    type: SpNasP
    max_sample: 1

model:
    pretrained_model_file: "{local_base_path}/output/fine_tune/model_0.pth"
    model_desc:
        type: FasterRCNN
        neck:
            type: ParallelFPN

search_space:
    type: SearchSpace
    hyperparameters:
        -   key: network.neck.code
            type: CATEGORY
            range: [[0, 1, 2, 3]]

trainer:
    ref: serial.trainer

dataset:
    ref: serial.dataset

fullytrain: pipe_step: type: TrainPipeStep models_folder: "{local_base_path}/output/parallel/"

trainer:
    ref: serial.trainer
    epochs: 1

dataset:
    ref: serial.dataset

`

其中我修改了reignition阶段的数据集为coco,难道reignition阶段只能用imagenet吗?还是我配置的数据集有问题;

gyr-kdgc commented 1 year ago

换成imagenet数据集后,可以完整运行了,目前的疑问是serial阶段的搜索空间['111-2111-211111-211']和parallel阶段的搜索空间[[0, 1, 2, 3]]分别代表什么意思?如果自定义的话应该怎么去设置?

dawncc commented 1 year ago

serial阶段的搜索空间['111-2111-211111-211']中,表示backbone有[3, 4, 6, 3]个block,-表示下采样的位置,2表示channels *2。 parallel的[[0, 1, 2, 3]]是FPN的每个特征层融合包含的layer的数量,里面的值是[0,3]之间随机的,比如[0,3,0,1]这样都行。