issue with fully_train step in the provided modnas DARTS example

aswanthkrishna commented 3 years ago

I am trying to run the examples provided for modnas darts algorithm. The search stage of the algorithm works perfectly fine the best architecture is getting exported to "{local_worker_path}/exp/default/outputs/arch_search_best.yaml" but errors are popping up in fully train step From the provided documentation i am supposed to pass obtained arch_desc file to train Pipestep in the following way

search_space: type: SearchSpace modules: [custom] custom: type: ModNasArchSpace model: ... construct: ... # search space constructor desc_construct: arch_desc: type: # arch. desc. constructor

however passing the path as argument to "arch_desc" is not solving the problem. Can someone tell me how to properly add arch_desc in the config file. also Thank you for this great library :)

hujx0 commented 3 years ago

The document description is not very correct. The path should be passed as a as argument to “fully_train.model.model_desc_file”. @aswanthkrishna

aswanthkrishna commented 3 years ago

this did not solve the problem. However I can run the fully_train step separately by loading the searched model from arch_desc by adding this in the config file.

            modules: [custom]
            custom:
                type: ModNasArchSpace
                fully_train: True
                model:
                    type: CIFAR_MobileNetV2_GPU
                    args:
                        n_classes: 10
                desc_construct:
                    torch:
                        type: MobileNetV2ArchDescConstructor
                        args:
                            arch_desc: "/home/ubuntu/vega_new/tasks/0519.174421.182/workers/nas/0/exp/default/output/arch_search_best.yaml"

CreeperLin commented 3 years ago

Hello! I am the author of the modnas library and sorry for the late reply! This problem is caused by a compatibility issue with the updated vega pipeline function. I've created a patch to fix this:

diff --git a/vega/algorithms/nas/modnas/compat/trainer_callback.py b/vega/algorithms/nas/modnas/compat/trainer_callback.py
index c1b5b92..a9e5e93 100644
--- a/vega/algorithms/nas/modnas/compat/trainer_callback.py
+++ b/vega/algorithms/nas/modnas/compat/trainer_callback.py
@@ -16,6 +16,7 @@ import traceback
 from zeus.common import FileOps
 from zeus.common import ClassFactory, ClassType
 from zeus.trainer.callbacks import Callback
+from zeus.report import ReportClient
 from vega.core.search_space import SearchSpace
 from vega.core.search_algs import SearchAlgorithm
 from modnas.data_provider.predefined.default import DefaultDataProvider
@@ -331,6 +332,9 @@ class ModNasTrainerCallback(Callback):
         desc = self.trainer.model_desc.copy()
         desc['custom']['arch_desc'] = ret.get('best_arch')
         self.trainer.config.codec = desc
+        record = ReportClient.get_record(self.trainer.step_name, self.trainer.worker_id)
+        record.desc = desc
+        ReportClient.broadcast(record)

     def after_valid_step(self, batch_index, logs=None):
         """Be called after a batch validation."""

also as file: patch.txt

You can apply the patch by running the following commands in the vega root directory:

git apply patch.txt

or

patch --strip -i patch.txt

Thanks for trying the modnas library!

CreeperLin commented 3 years ago

this did not solve the problem. However I can run the fully_train step separately by loading the searched model from arch_desc by adding this in the config file.

            modules: [custom]
            custom:
                type: ModNasArchSpace
                fully_train: True
                model:
                    type: CIFAR_MobileNetV2_GPU
                    args:
                        n_classes: 10
                desc_construct:
                    torch:
                        type: MobileNetV2ArchDescConstructor
                        args:
                            arch_desc: "/home/ubuntu/vega_new/tasks/0519.174421.182/workers/nas/0/exp/default/output/arch_search_best.yaml"

As for the YAML configuration, the intended place to specify the arch_desc when running fully_train step is "model.model_desc.custom.arch_desc":

modules: [custom]
    custom:
        type: ModNasArchSpace
        fully_train: True
        model:
            type: CIFAR_MobileNetV2_GPU
            args:
                n_classes: 10
        desc_construct:
            torch:
                type: MobileNetV2ArchDescConstructor
        arch_desc: ### path to arch_desc yaml file, or arch_desc object itself

I will note this in the documentations, thank you! @aswanthkrishna

aswanthkrishna commented 3 years ago

thanks a lot. the issue is solved. Can you provide me an example config file on creating DARTSsearch space within modnas?

huawei-noah / vega

issue with fully_train step in the provided modnas DARTS example #112