microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.06k stars 1.82k forks source link

ERROR: Strategy failed to execute. #5774

Open ktunlab opened 6 months ago

ktunlab commented 6 months ago

Describe the issue: I’m trying to learn how to implement NAS using NNI. However, I'm getting the ‘ImportError: Cannot use a path to identify something from main.’ and ‘TypeError: cannot pickle 'CudnnModule' object’ errors listed below.

my code: https://github.com/ktunlab/nas-resnet-demo

Environment:

Configuration:

Log message:

log: [2024-04-26 15:45:28] Config is not provided. Will try to infer. [2024-04-26 15:45:28] Using execution engine based on training service. Trial concurrency is set to 1. [2024-04-26 15:45:28] Using simplified model format. [2024-04-26 15:45:28] Using local training service. [2024-04-26 15:45:28] WARNING: GPU found but will not be used. Please setexperiment.config.trial_gpu_number` to the number of GPUs you want to use for each trial. [2024-04-26 15:45:30] Creating experiment, Experiment ID: lyjc7okv [2024-04-26 15:45:30] Starting web server... [2024-04-26 15:45:30] Setting up... [2024-04-26 15:45:30] Web portal URLs: http://172.22.9.46:8081 http://127.0.0.1:8081 [2024-04-26 15:45:30] Successfully update searchSpace. [2024-04-26 15:45:30] Checkpoint saved to C:\Users\Lab-d\nni-experiments\lyjc7okv\checkpoint. [2024-04-26 15:45:30] Experiment initialized successfully. Starting exploration strategy... [2024-04-26 15:45:30] ERROR: Strategy failed to execute. Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 831, in get_hybrid_cls_or_func_name name = _get_cls_or_func_name(cls_or_func) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 810, in _get_cls_or_func_name raise ImportError('Cannot use a path to identify something from main.') ImportError: Cannot use a path to identify something from main.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 103, in exp.run(port=8081) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\experiment\experiment.py", line 236, in run return self._run_impl(port, wait_completion, debug) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\experiment\experiment.py", line 205, in _run_impl self.start(port, debug) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\experiment\experiment.py", line 270, in start self._start_engine_and_strategy() File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\experiment\experiment.py", line 230, in _start_engine_and_strategy self.strategy.run() File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\strategy\base.py", line 170, in run self._run() File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\strategy\bruteforce.py", line 223, in _run self.engine.submit_models(model) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\execution\training_service.py", line 172, in submit_models self._channel.send_trial( File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\runtime\tuner_command_channel\channel.py", line 144, in send_trial send_payload = dump(trial_dict, pickle_size_limit=int(os.getenv('PICKLE_SIZE_LIMIT', 64 * 1024))) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 372, in dump result = _dump( File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 424, in _dump return json_tricks.dumps(obj, obj_encoders=encoders, *json_tricks_kwargs) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\json_tricks\nonp.py", line 125, in dumps txt = combined_encoder.encode(obj) File "C:\ProgramData\Anaconda3\envs\proje\lib\json\encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "C:\ProgramData\Anaconda3\envs\proje\lib\json\encoder.py", line 257, in iterencode return _iterencode(o, 0) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\json_tricks\encoders.py", line 76, in default obj = encoder(obj, primitives=self.primitives, is_changed=id(obj) != prev_id, properties=self.properties) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\json_tricks\utils.py", line 66, in wrapper return encoder(args, **{k: v for k, v in kwargs.items() if k in names}) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 858, in _json_tricks_func_or_cls_encode '__nni_type__': get_hybrid_cls_or_func_name(cls_or_func, pickle_size_limit) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 835, in get_hybrid_cls_or_func_name b = cloudpickle.dumps(cls_or_func) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\cloudpickle\cloudpickle.py", line 1479, in dumps cp.dump(obj) File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\cloudpickle\cloudpickle.py", line 1245, in dump return super().dump(obj) TypeError: cannot pickle 'CudnnModule' object [2024-04-26 15:45:30] Stopping experiment, please wait... [2024-04-26 15:45:30] Checkpoint saved to C:\Users\Lab-d\nni-experiments\lyjc7okv\checkpoint. [2024-04-26 15:45:30] Experiment stopped`

How to reproduce it?: python train.py