prismformore / Multi-Task-Transformer

Code of ICLR2023 paper "TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding" and ECCV2022 paper "Inverted Pyramid Multi-task Transformer for Dense Scene Understanding"
MIT License
299 stars 23 forks source link

script error #8

Closed cenchaojun closed 1 year ago

cenchaojun commented 1 year ago

thank you for your nice works. I want to try run this project, but got below errors. /home/cenchaojun/.conda/envs/invpt/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( local rank: 0 {'version_name': 'InvPT_pascal_vitLp16', 'out_dir': '../', 'train_db_name': 'PASCALContext', 'val_db_name': 'PASCALContext', 'trBatch': 2, 'valBatch': 6, 'nworkers': 2, 'ignore_index': 255, 'intermediate_supervision': True, 'val_interval': 1000, 'epochs': 999999, 'max_iter': 40000, 'optimizer': 'adam', 'optimizer_kwargs': {'lr': 2e-05, 'weight_decay': 1e-06}, 'scheduler': 'poly', 'model': 'TransformerNet', 'backbone': 'vitL', 'head': 'mlp', 'embed_dim': 512, 'mtt_resolution_downsample_rate': 2, 'PRED_OUT_NUM_CONSTANT': 64, 'task_dictionary': {'include_semseg': True, 'include_human_parts': True, 'include_sal': True, 'include_edge': True, 'include_normals': True, 'edge_w': 0.95}, 'loss_kwargs': {'loss_weights': {'semseg': 1.0, 'human_parts': 2.0, 'sal': 5.0, 'edge': 50.0, 'normals': 10.0}}, 'TASKS': {'NAMES': ['semseg', 'human_parts', 'sal', 'normals', 'edge'], 'NUM_OUTPUT': {'semseg': 21, 'human_parts': 7, 'sal': 2, 'normals': 3, 'edge': 1}, 'FLAGVALS': {'image': 2, 'semseg': 0, 'human_parts': 0, 'sal': 0, 'normals': 2, 'edge': 0}, 'INFER_FLAGVALS': {'semseg': 0, 'human_parts': 0, 'sal': 1, 'normals': 1, 'edge': 1}}, 'edge_w': 0.95, 'eval_edge': False, 'TRAIN': {'SCALE': [512, 512]}, 'TEST': {'SCALE': [512, 512]}, 'root_dir': '../InvPT_pascal_vitLp16', 'output_dir': '../InvPT_pascal_vitLp16', 'save_dir': '../InvPT_pascal_vitLp16/results', 'checkpoint': '../InvPT_pascal_vitLp16/checkpoint.pth.tar', 'run_mode': 'train', 'db_paths': {'PASCALContext': './dataset/PASCALContext', 'NYUD_MT': './dataset/NYUDv2'}, 'PROJECT_ROOT_DIR': ''} Tensorboard dir: ../InvPT_pascal_vitLp16/tb_dir Optimizer uses a single parameter group - (Default) Preparing train dataset for db: PASCALContext

Initializing dataloader for PASCAL train set Traceback (most recent call last): File "/home/cenchaojun/phd2code/invpt/main.py", line 169, in main() File "/home/cenchaojun/phd2code/invpt/main.py", line 104, in main train_dataset = get_train_dataset(p, train_transforms) File "/home/cenchaojun/phd2code/invpt/utils/common_config.py", line 96, in get_train_dataset database = PASCALContext(p.db_paths['PASCALContext'], download=False, split=['train'], transform=transforms, retname=True, File "/home/cenchaojun/phd2code/invpt/data/pascal_context.py", line 174, in init assert os.path.isfile(_human_part) AssertionError ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 10820) of binary: /home/cenchaojun/.conda/envs/invpt/bin/python3.8 Traceback (most recent call last): File "/home/cenchaojun/.conda/envs/invpt/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/cenchaojun/.conda/envs/invpt/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/cenchaojun/.conda/envs/invpt/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/cenchaojun/.conda/envs/invpt/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run elastic_launch( File "/home/cenchaojun/.conda/envs/invpt/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/cenchaojun/.conda/envs/invpt/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/cenchaojun/phd2code/invpt/main.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2022-11-21_16:07:10 host : cieelab rank : 0 (local_rank: 0) exitcode : 1 (pid: 10820) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
cenchaojun commented 1 year ago

I solved this problem, it need to add absolute path