huawei-noah / vega

AutoML tools chain
http://www.noahlab.com.hk/opensource/vega/
Other
842 stars 175 forks source link

pipeline with nas and fullytrain #195

Open iamsuryan opened 2 years ago

iamsuryan commented 2 years ago

Hi, I edited the nago.yml in example to include the fully_train step. This is the .yml file that I created with the help of other files:

general:
    backend: pytorch
    parallel_search: True
    parallel_fully_train: True
    requires: ["dataclasses", "networkx"]

pipeline: [nas, fully_train]

nas:
    pipe_step:
        type: SearchPipeStep

    dataset:
        type: Cifar10
        common:
            data_path: ../datasets/cifar10/
            batch_size: 128

    search_algorithm:
        type: BohbHpo
        policy:
            total_epochs: -1
            repeat_times: 1
            num_samples: 7  # 4 + 2 + 1, 3, 1
            max_epochs: 120
            min_epochs: 30
            eta: 2

    search_space:
        type: SearchSpace
        hyperparameters:
            -   key: network.custom.G1_nodes
                type: INT
                range: [3, 10]
            -   key: network.custom.G1_K
                type: INT
                range: [2, 5]
            -   key: network.custom.G1_P
                type: FLOAT
                range: [0.1, 1.0]
            -   key: network.custom.G2_nodes
                type: INT
                range: [3, 10]
            -   key: network.custom.G2_P
                type: FLOAT
                range: [0.2, 1.0]
            -   key: network.custom.G3_nodes
                type: INT
                range: [3, 10]
            -   key: network.custom.G3_K
                type: INT
                range: [2, 5]
            -   key: network.custom.G3_P
                type: FLOAT
                range: [0.1, 1.0]

    model:
        model_desc:
            modules: ['custom']
            custom:
                type: NAGO
                stage1_ratio: 1.0
                stage2_ratio: 1.0
                stage3_ratio: 1.0
                ch1_ratio: 1
                ch2_ratio: 2
                ch3_ratio: 4
                n_param_limit: 4.0e6  # number_of_params
                image_size: 32  # 32, cifar10
                num_classes: 10

    trainer:
        type: Trainer
        epochs: 1
        optimizer:
            type: SGD
            params:
                lr: 0.1
                momentum: 0.9
                weight_decay: !!float 1e-4
        lr_scheduler:
            type: StepLR
            params:
                step_size: 20
                gamma: 0.5
        seed: 10

    evaluator:
        type: Evaluator
        host_evaluator:
            type: HostEvaluator
            metric:
                type: accuracy

fully_train:
    pipe_step:
        type: TrainPipeStep
    dataset:
        ref: nas.dataset
    model:
        ref: nas.model
    trainer:
        ref: nas.trainer
        epochs: 200
        hps_file: "{local_base_path}/output/nas"
    evaluator:
        type: Evaluator
        host_evaluator:
            type: HostEvaluator
            metric:
                type: accuracy

I expected it to fully train the model output by the NAS. But when I checked the logs after it was completed I did not find anything related to fullytrain. Can you please tell how to make sure that the fully_train step worked and if not how to add fully_train step to nas?

Thanks,

zhangjiajin commented 2 years ago

Comment out the following lines:

fully_train:
    pipe_step:
        type: TrainPipeStep
    dataset:
        ref: nas.dataset
    # model:
    #     ref: nas.model
    trainer:
        ref: nas.trainer
        epochs: 200
        # hps_file: "{local_base_path}/output/nas"
    evaluator:
        type: Evaluator
        host_evaluator:
            type: HostEvaluator
            metric:
                type: accuracy