microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.07k stars 1.82k forks source link

support for mmdetection #5027

Open jaypatravali opened 2 years ago

jaypatravali commented 2 years ago

Describe the issue:

I am running mmdetection and call the nni engine from a script after defining paths, logdirs etc.

It looks like this

def main(args_param):
   cmd = "python -m torch.distributed.launch --nproc_per_node=4 " + args.train_script + " " + args.config_file + " --cfg- 
            options " + params_str + " --no-validate --launcher pytorch --work-dir " + run_output_dir
    try:
        process = subprocess.Popen(cmd.split(" "), stdout=subprocess.PIPE)
        data = process.communicate()
        #get val_acc from the data somehow
    except subprocess.CalledProcessError:
        print("JOB FAILURE")

if __name__ == "__main__":
    params = nni.get_next_parameter()
    main(params)

the mmdet config has workflow = [('train', 1), ('val', 1)]

I wish to use the early stopping feature and add nni.report_intermediate_result(val_acc) while the subprocess initiates training and validation from cmd. Can you help me arrive at a solution?

Environment:

ultmaster commented 2 years ago

You have to call nni.report_final_result (or intermediate) in the main process. When you enter the subprocess, the context is missing, therefore the report_xxx_result APIs might not be available.

QuanluZhang commented 2 years ago

A related issue for your reference #4869 , in where the provided solution also does not support reporting intermediate result. The reason has been explained by @ultmaster.

Currently, one hacky way is the subprocess outputs intermediate results to a file and the main process polls the file to read the intermediate results and calls nni.report_intermediate_result in the main process.

We will support nni.report_intermediate_result in subprocess in nni v3.0