I encountered the following error while attempting to reproduce the steps you provided. Could you please tell me why this error occurred?

JeffreyYANGS commented 7 months ago

Traceback (most recent call last): File "F:\pythonProject1\SeD-main\train.py", line 15, in from models import model_rrdb, model_swinir, sed File "F:\pythonProject1\venv\Lib\site-packages\models__init.py", line 37, in import project ModuleNotFoundError: No module named 'project' Traceback (most recent call last): File "F:\pythonProject1\SeD-main\train.py", line 15, in from models import model_rrdb, model_swinir, sed File "F:\pythonProject1\venv\Lib\site-packages\models\init__.py", line 37, in import project ModuleNotFoundError: No module named 'project'

lbc12345 commented 7 months ago

Hello! Thanks for your interest in our work! It seems like you have 'models' package installed in your venv, which causes conflicts with 'from models import model_rrdb...', please uninstall this package if possible! Refer to this link

JeffreyYANGS commented 7 months ago

Thanks for the answer, I have deleted the models file as requested, but the following error appears, please answer again, thank you very much

Traceback (most recent call last): File "F:\pythonProject1\SeD-main\train.py", line 15, in from models import model_rrdb, model_swinir, sed File "F:\pythonProject1\SeD-main\models\model_swinir.py", line 11, in from timm.models.layers import DropPath, to_2tuple, truncnormal File "F:\pythonProject1\venv\Lib\site-packages\timminit.py", line 2, in from .models import create_model, list_models, is_model, list_modules, model_entrypoint, File "F:\pythonProject1\venv\Lib\site-packages\timm\modelsinit.py", line 28, in from .maxxvit import File "F:\pythonProject1\venv\Lib\site-packages\timm\models\maxxvit.py", line 225, in @DataClass Traceback (most recent call last): File "F:\pythonProject1\SeD-main\train.py", line 15, in from models import model_rrdb, model_swinir, sed File "F:\pythonProject1\SeD-main\models\model_swinir.py", line 11, in from timm.models.layers import DropPath, to_2tuple, truncnormal File "F:\pythonProject1\venv\Lib\site-packages\timminit.py", line 2, in from .models import create_model, list_models, is_model, list_modules, model_entrypoint, File "F:\pythonProject1\venv\Lib\site-packages\timm\modelsinit.py", line 28, in from .maxxvit import File "F:\pythonProject1\venv\Lib\site-packages\timm\models\maxxvit.py", line 225, in ^^^^^^ @DataClass^ ^ ^^ File "D:\python\Lib\dataclasses.py", line 1230, in dataclass return wrap(cls) ^^^^^^^^^ File "D:\python\Lib\dataclasses.py", line 1220, in wrap ^^ ^return _process_class(cls, init, repr, eq, order, unsafe_hash,^ ^ ^ ^^^^ File "D:\python\Lib\dataclasses.py", line 1230, in dataclass ^^^^^^^^^^^^^^^^^^^ ^return wrap(cls)^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^ File "D:\python\Lib\dataclasses.py", line 1220, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^ File "D:\python\Lib\dataclasses.py", line 958, in _process_class ^^^^^^^^^^ ^cls_fields.append(_get_field(cls, name, type, kw_only))^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\Lib\dataclasses.py", line 958, in _process_class cls_fields.append(_get_field(cls, name, type, kw_only)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python\Lib\dataclasses.py", line 815, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default <class 'timm.models.maxxvit.MaxxVitConvCfg'> for field conv_cfg is not allowed: use default_fact ory ^^^^^^^^^^^^^^^^^^^^ File "D:\python\Lib\dataclasses.py", line 815, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default <class 'timm.models.maxxvit.MaxxVitConvCfg'> for field conv_cfg is not allowed: use default_fact ory [2024-04-13 11:35:26,796] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 49 24) of binary: F:\pythonProject1\venv\Scripts\python.exe Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "F:\pythonProject1\venv\Lib\site-packages\torch\distributed\launch.py", line 198, in main() File "F:\pythonProject1\venv\Lib\site-packages\torch\distributed\launch.py", line 194, in main launch(args) File "F:\pythonProject1\venv\Lib\site-packages\torch\distributed\launch.py", line 179, in launch run(args) File "F:\pythonProject1\venv\Lib\site-packages\torch\distributed\run.py", line 803, in run elastic_launch( File "F:\pythonProject1\venv\Lib\site-packages\torch\distributed\launcher\api.py", line 135, in call return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\pythonProject1\venv\Lib\site-packages\torch\distributed\launcher\api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: train.py FAILED Failures: [1]: time : 2024-04-13_11:35:26 host : DESKTOP-AJ9M488 rank : 1 (local_rank: 1) exitcode : 1 (pid: 3048) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2024-04-13_11:35:26 host : DESKTOP-AJ9M488 rank : 2 (local_rank: 2) exitcode : 1 (pid: 13436) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2024-04-13_11:35:26 host : DESKTOP-AJ9M488 rank : 3 (local_rank: 3) exitcode : 1 (pid: 2008) error_file: <N/A> Root Cause (first observed failure): [0]: time : 2024-04-13_11:35:26 host : DESKTOP-AJ9M488 rank : 0 (local_rank: 0) exitcode : 1 (pid: 4924) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

lbc12345 commented 7 months ago

Hi, It looks like the problem with your environment, would you please try to build a new environment following our guidance? We do not meet such problems during training.

JeffreyYANGS commented 6 months ago

Thanks. The environment has been reconfigured as required, and the code shows that there is a problem with the clip package, and I didn't find the ModifiedResNet module it needs, please let me know, thank you. At the same time I'm not very clear about the meaning of these two in your training code, -m torch.distributed.launch --nproc_per_node=4.

lbc12345 commented 6 months ago

For the first problem, did you install clip package correctly? Would you please run 'pip list' and check the version of clip package? We do not meet such problem. Make sure you do not meet network problem during installing of clip package. For the second problem, please refer to pytorch DDP training. It means use 4 GPUs in parallel to train our model.

JeffreyYANGS commented 6 months ago

Thanks for the answer, the model has run successfully. I also want to ask, I see that this model does not have an evaluation index, how to judge the effect of super-score, and there is no log record to show the training progress, will these be considered to be added? Thank you

lbc12345 commented 6 months ago

In our implementation, we will evaluate the performance of the trained model every 5000 iterations on the Set5 benchmark, as set in yml file. If you want to evaluate it on another dataset, you may change it by yourself~ If I haven't misunderstood your meaning, our "evaluation index" is PSNR and SSIM.