open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
8.29k stars 2.62k forks source link

OSError: [Errno 95] Operation not supported #1587

Closed deepakkupanda closed 2 years ago

deepakkupanda commented 2 years ago

Traceback (most recent call last): File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/deepakpanda9/code/Users/deepakpanda/segmentation/mmsegmentation/tools/train.py", line 240, in main() File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/deepakpanda9/code/Users/deepakpanda/segmentation/mmsegmentation/tools/train.py", line 229, in main train_segmentor( File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/deepakpanda9/code/Users/deepakpanda/segmentation/mmsegmentation/mmseg/apis/train.py", line 191, in train_segmentor runner.run(data_loaders, cfg.workflow) File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/runner/iter_based_runner.py", line 134, in run iter_runner(iter_loaders[i], *kwargs) File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/runner/iter_based_runner.py", line 67, in train self.call_hook('after_train_iter') File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook getattr(hook, fn_name)(self) File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/runner/hooks/checkpoint.py", line 167, in after_train_iter self._save_checkpoint(runner) File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/runner/dist_utils.py", line 129, in wrapper return func(args, kwargs) File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/runner/hooks/checkpoint.py", line 121, in _save_checkpoint runner.save_checkpoint( File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/runner/iter_based_runner.py", line 220, in save_checkpoint mmcv.symlink(filename, dst_file) File "/anaconda/envs/open-mmlab/lib/python3.10/site-packages/mmcv/utils/path.py", line 36, in symlink os.symlink(src, dst, kwargs) OSError: [Errno 95] Operation not supported: 'iter_16000.pth' -> '/mnt/batch/tasks/shared/LS_root/mounts/clusters/deepakpanda9/code/Users/deepakpanda/segmentation/mmsegmentation/work_dirs/upernet_beit-base_640x640_160k_ade20k/latest.pth' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 27006) of binary: /anaconda/envs/open-mmlab/bin/python

MengzhangLI commented 2 years ago

Just google it and you would find out you are currently in the directory in which you don't have write permissions.

deepakkupanda commented 2 years ago

Thanks for your quick reply. I do have access to work_dirs. How can I change the location of the path where it is getting saved? Thanks!

MengzhangLI commented 2 years ago

Thanks for your quick reply. I do have access to work_dirs. How can I change the location of the path where it is getting saved? Thanks!

You can use --work-dir.

deepakkupanda commented 2 years ago

I changed the work_dir but it was still giving the error.

mmsegmentation_error
deepakkupanda commented 2 years ago

I followed the comments from this issue Some environments do not support os.symlink so that you can add an argument in the checkpoint_cfg field in config files, like checkpoint_cfg=dict(create_symlink=False). I added the checkpoint_cfg in the schedule checkpoint_config = dict(by_epoch=False, interval=16000, create_symlink=False)