A running code would 're-import' the files after epoch ended

twangnh commented 4 years ago

Here is a strange error case, I thought a running code would not be influenced by modification to current code, but I just modified some code (i.e., add lvis import), for a existing training, when it ends at the epoch, it would 're-import' the files and lead to error, I'm not sure if it is cased by the data loader, can anyone help?

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/mnt/WXRC0020/users/user/anaconda3/envs/solo_mmdet_py37/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/mnt/WXRC0020/users/user/anaconda3/envs/solo_mmdet_py37/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
  File "/mnt/WXRC0020/users/user/anaconda3/envs/solo_mmdet_py37/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/mnt/WXRC0020/users/user/anaconda3/envs/solo_mmdet_py37/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/mnt/WXRC0020/users/user/anaconda3/envs/solo_mmdet_py37/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/mnt/WXRC0020/users/user/anaconda3/envs/solo_mmdet_py37/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/mnt/WXRC0020/users/user/anaconda3/envs/solo_mmdet_py37/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/tools/train.py", line 13, in <module>
    from mmdet.apis import set_random_seed, train_detector
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/mmdet/apis/__init__.py", line 1, in <module>
    from .inference import (async_inference_detector, inference_detector,
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/mmdet/apis/inference.py", line 11, in <module>
    from mmdet.core import get_classes
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/mmdet/core/__init__.py", line 3, in <module>
    from .evaluation import *  # noqa: F401, F403
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/mmdet/core/evaluation/__init__.py", line 5, in <module>
    from .eval_hooks import (CocoDistEvalmAPHook, CocoDistEvalRecallHook,
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/mmdet/core/evaluation/eval_hooks.py", line 13, in <module>
    from mmdet import datasets
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/mmdet/datasets/__init__.py", line 4, in <module>
    from .lvis import LvisDataset
  File "/mnt/WXRC0020/users/user/prj/prj-ongoing/mmdet/datasets/lvis.py", line 6, in <module>
    from lvis.lvis import LVIS
ModuleNotFoundError: No module named 'lvis'

Importantly, if one modify some code during model training and there is no error injured, then that modification would silently affect our existing model training even it is started before the modification, and that is really not wanted

hellock commented 4 years ago

Actually it is normal. The multiprocessing mode is set to spawn.

twangnh commented 4 years ago

@hellock, Hi Kai, thanks, I'm wondering how to set it to a mode that does not have the "re-import" step, because if I modify the code when a previous code is running, the reimport would load something that was not expected for the previous run?

open-mmlab / mmdetection

A running code would 're-import' the files after epoch ended #2593