Closed gihwan-kim closed 1 year ago
Hey @gihwan-kim, this seems to be a PyAV
error, since av.codec.codec.UnknownCodecError: libx264
is raised.
What PyAV
version you used?
Hey @gihwan-kim, this seems to be a
PyAV
error, sinceav.codec.codec.UnknownCodecError: libx264
is raised. WhatPyAV
version you used?
>>> import av
>>> av.__version__
'8.0.2'
Its ver 8.0.2
@gihwan-kim, Can you attempt to install PyAV==8.0.3
. BTW, what is your ffmpeg
version?
@gihwan-kim, Can you attempt to install
PyAV==8.0.3
. BTW, what is yourffmpeg
version?
Thank you for kind reply!. When i changed pyav version to 8.0.3, it dose not appear UnknownCodecError. But new issue is occured..;
and ffmpeg version is 4.3
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.75 GiB total capacity; 9.28 GiB already allocated; 18.75 MiB free; 9.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 12491) of binary: /home/gihwan/anaconda3/envs/openmmlab2/bin/python
Traceback (most recent call last):
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
I think too much iteration need large memory. Should i have to change training configuration?
This is my output of nvidia-smi
command.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 35% 29C P8 1W / 260W | 102MiB / 11011MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1116 G /usr/lib/xorg/Xorg 39MiB |
| 0 N/A N/A 1243 G /usr/bin/gnome-shell 60MiB |
@gihwan-kim, training RealBasicVSR with default config needs at least 17201MB of GPU memory, and you can refer to the memory
field in log.
I think you can try to change crop_size
in the training pipeline to a smaller value to save memory.
@gihwan-kim, training RealBasicVSR with default config needs at least 17201MB of GPU memory, and you can refer to the
memory
field in log.I think you can try to change
crop_size
in the training pipeline to a smaller value to save memory.
I could solve by changing workers_per_gpu, samples_per_gpu, num_input_frames
values in cofig file. Thank you!
But, while training, i found "file not found error"
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gihwan/mmedit/mmedit/datasets/base_sr_dataset.py", line 52, in __getitem__
return self.pipeline(results)
File "/home/gihwan/mmedit/mmedit/datasets/pipelines/compose.py", line 42, in __call__
data = t(data)
File "/home/gihwan/mmedit/mmedit/datasets/pipelines/loading.py", line 176, in __call__
img_bytes = self.file_client.get(filepath)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/mmcv/fileio/file_client.py", line 993, in get
return self.client.get(filepath)
File "/home/gihwan/anaconda3/envs/openmmlab2/lib/python3.7/site-packages/mmcv/fileio/file_client.py", line 518, in get
with open(filepath, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/UDM10/BIx4/archpeople/00000000.png'
My UDM data file path is 'data/UDM10/BIx4/archpeople/000.png'. Is there any naming rule or guide about preprocessing UDM10 file to BIx4 in RealBasicVSR? I found REDS and videoLQ guide in paper and official document. But i couldn't find where to download and pre processing UDM10 data. I just found where to down load from this link udm10
@gihwan-kim For the master branch, you need to rename your images of datasets. You can use a simple script to resolve it. Like this:
data_root = 'dataset/data/udm10/'
save_root = 'dataset/data/UDM10/'
dirs = os.listdir(data_root)
dirs = sorted(dirs, key=str.lower)
num = 0
for _dir in dirs:
sub_root1 = save_root+'GT/'+str(num).zfill(8)
sub_root2 = save_root+'BDx4/'+str(num).zfill(8)
os.system('cp -r '+data_root+_dir+'/truth/ '+sub_root1)
os.system('cp -r '+data_root+_dir+'/blur4/ '+sub_root2)
num+=1
For the 1.x or dev-1.x branch, if your UDM data file path is 'data/UDM10/BIx4/archpeople/000.png', you can simply add a parameter like filename_tmpl='{:03d}.png'
. You can reference https://github.com/open-mmlab/mmediting/blob/dev-1.x/configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py#L204
@gihwan-kim For the master branch, you need to rename your images of datasets. You can use a simple script to resolve it. Like this:
data_root = 'dataset/data/udm10/' save_root = 'dataset/data/UDM10/' dirs = os.listdir(data_root) dirs = sorted(dirs, key=str.lower) num = 0 for _dir in dirs: sub_root1 = save_root+'GT/'+str(num).zfill(8) sub_root2 = save_root+'BDx4/'+str(num).zfill(8) os.system('cp -r '+data_root+_dir+'/truth/ '+sub_root1) os.system('cp -r '+data_root+_dir+'/blur4/ '+sub_root2) num+=1
For the 1.x or dev-1.x branch, if your UDM data file path is 'data/UDM10/BIx4/archpeople/000.png', you can simply add a parameter like
filename_tmpl='{:03d}.png'
. You can reference https://github.com/open-mmlab/mmediting/blob/dev-1.x/configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py#L204
Thank you for kindness help! As i mentioned. I have question about validation data set UDM10 in RealBasicVSR. I downloaded udm10 dataset from this link udm10 download site . This site's udm10 data directory structure is
./udm10
βββ archpeople
βΒ Β βββ blur4
βΒ Β βββ truth
βββ archwall
βΒ Β βββ blur4
βΒ Β βββ truth
βββ auditorium
βΒ Β βββ blur4
βΒ Β βββ truth
βββ band
βΒ Β βββ blur4
βΒ Β βββ truth
βββ caffe
βΒ Β βββ blur4
βΒ Β βββ truth
βββ camera
βΒ Β βββ blur4
βΒ Β βββ truth
βββ clap
βΒ Β βββ blur4
βΒ Β βββ truth
βββ lake
βΒ Β βββ blur4
βΒ Β βββ truth
βββ photography
βΒ Β βββ blur4
βΒ Β βββ truth
βββ polyflow
βββ blur4
βββ truth
Is it blur4 data is BIx4 ? Or should i have to pre-processing like this link link ?
And Blx4 mean bicubic interpolation x4 downsampling?
Blur4 is not BIx4 or BDx4. BIx4 and BDx4 are both pre-processed using MATLAB. For BDx4, you need to use MATLAB script https://github.com/ckkelvinchan/BasicVSR-IconVSR/blob/main/BD_degradation.m . For BIx4, you can simply use imresize
of MATLAB to get data or imresize
of python implementation like this https://github.com/fatheral/matlab_imresize/blob/master/imresize.py . I can provide my data if you need it.
And Blx4 mean bicubic interpolation x4 downsampling. @gihwan-kim
Blur4 is not BIx4 or BDx4. BIx4 and BDx4 are both pre-processed using MATLAB. For BDx4, you need to use MATLAB script https://github.com/ckkelvinchan/BasicVSR-IconVSR/blob/main/BD_degradation.m . For BIx4, you can simply use
imresize
of MATLAB to get data orimresize
of python implementation like this https://github.com/fatheral/matlab_imresize/blob/master/imresize.py . I can provide my data if you need it. And Blx4 mean bicubic interpolation x4 downsampling. @gihwan-kim
I will check imresize
code what you mentioned thank you! :)
If you okay. It would be grateful if you could send your data.
https://drive.google.com/file/d/1G4V4KZZhhfzUlqHiSBBuWyqLyIOvOs0W/view?usp=share_link @gihwan-kim
Thank you.!
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmediting
Environment
sys.platform: linux Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 2080 Ti CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.2, V11.2.152 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.2 PyTorch compiling details: PyTorch built with:
TorchVision: 0.11.3 OpenCV: 4.5.4 MMCV: 1.5.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.3 MMEditing: 0.16.0+7b3a8bd
Reproduces the problem - code sample
I just training again.
Reproduces the problem - command or script
Reproduces the problem - error message
Additional information
I'm trying to train a Real BasicVSR to check if it trains in my environment.
I have similar issue like issue. But that issue isn't resolved yet.