Closed longweiwei closed 3 years ago
what's the decord version? from which epoch does the overfitting become obvious? have you properly rescale lr? did you use the full kinetics400?
@innerlee
could you randomly select 100 videos, for each vid, randomly access three frames, and compare frames decoded from decord and opencv?
@innerlee I completed the experiment according to your method。the value of every frame from decord and opencv is same. The test code is as follows:
import cv2
import decord
import numpy as np
ann_file_train = '/raid/Research/workspace/lw/mmaction3/data/kinetics400/kinetics400_train_list_videos.txt'
data_root = '/raid/Research/workspace/lw/mmaction3/data/kinetics400/videos_train'
file = open(ann_file_train, 'r')
lines = file.readlines()
arr = np.random.choice(len(lines), size = (100), replace = False)
print(arr)
re = []
for index in arr:
line = lines[index].strip()
path, _ = line.split()
path = data_root + '/' + path
de = decord.VideoReader(path)
cv = cv2.VideoCapture(path)
frames = len(de)
print(path)
for i in range(frames):
de_im1 = de[i].asnumpy()
cv.set(cv2.CAP_PROP_POS_FRAMES,i) #设置帧数标记
_,cv_im1 = cv.read()
cv_im1 = cv2.cvtColor(cv_im1, cv2.COLOR_BGR2RGB)
re.append(np.all(cv_im1 == de_im1))
print(re)
for b in re:
if not b:
print("false")
Actually, The test result is the same as that given in mmaction2 by loading trained model weight given in mmaction for r2plus1d, so decord tool should be ok.
could you pls try
randomly access three frames
@innerlee Sorry, I didn’t look carefully。 I tried. the value of every frame from decord and opencv is same. The test code is as follows:
import cv2
import decord
import numpy as np
ann_file_train = '/raid/Research/workspace/lw/mmaction3/data/kinetics400/kinetics400_train_list_videos.txt'
data_root = '/raid/Research/workspace/lw/mmaction3/data/kinetics400/videos_train'
file = open(ann_file_train, 'r')
lines = file.readlines()
arr = np.random.choice(len(lines), size = (100), replace = False)
print(arr)
re = []
for index in arr:
line = lines[index].strip()
path, _ = line.split()
path = data_root + '/' + path
de = decord.VideoReader(path)
cv = cv2.VideoCapture(path)
frames = len(de)
print(path)
three_frame = np.random.choice(frames, size = (3,), replace = False)
for i in three_frame:
de_im1 = de[i].asnumpy()
cv.set(cv2.CAP_PROP_POS_FRAMES, i) # 设置帧数标记
_, cv_im1 = cv.read()
cv_im1 = cv2.cvtColor(cv_im1, cv2.COLOR_BGR2RGB)
re.append(np.all(cv_im1 == de_im1))
print(re)
for b in re:
if not b:
print("fff")
for opencv, use sequential reading instead of cv.set(cv2.CAP_PROP_POS_FRAMES, i)
, because CAP_PROP_POS_FRAMES is inexact. For decord, keep the current form
@innerlee i tried once again, The result is the same as before。 The test code is as follows:
import cv2
import decord
import numpy as np
ann_file_train = '/raid/Research/workspace/lw/mmaction3/data/kinetics400/kinetics400_train_list_videos.txt'
data_root = '/raid/Research/workspace/lw/mmaction3/data/kinetics400/videos_train'
file = open(ann_file_train, 'r')
lines = file.readlines()
arr = np.random.choice(len(lines), size = (100), replace = False)
print(arr)
count = 0
re = []
for index in arr:
line = lines[index].strip()
path, _ = line.split()
path = data_root + '/' + path
de = decord.VideoReader(path)
cv = cv2.VideoCapture(path)
frames = len(de)
print(path)
three_frame = np.random.choice(frames, size = (3,), replace = False)
for i in range(frames):
_, cv_im1 = cv.read()
cv_im1 = cv2.cvtColor(cv_im1, cv2.COLOR_BGR2RGB)
if i in three_frame:
de_im1 = de[i].asnumpy()
if np.sum(abs(de_im1 - cv_im1)) != 0:
count += 1
break
print(count)
Thank you for your kind help.
@dreamerlin any insight?
@innerlee
- the decord version is 0.4.0 . I found the updated version(eg. 0.4.1) always wrong.
- After 25 epochs , overfitting becomes obvious.
- the total batch size is 80. I think lr should not have such a big impact。
- yes, I used full kinetics400.
I noticed that in your log the videos_per_gpu
is set to 36, but lr still being 0.1. The learning rate do have great impact on convergence. The standard usage is lr0.1 : 64samples(mini-batch 8 * 8gpus).
yeah that's one suspect
@innerlee
- the decord version is 0.4.0 . I found the updated version(eg. 0.4.1) always wrong.
- After 25 epochs , overfitting becomes obvious.
- the total batch size is 80. I think lr should not have such a big impact。
- yes, I used full kinetics400.
I noticed that in your log the
videos_per_gpu
is set to 36, but lr still being 0.1. The learning rate do have great impact on convergence. The standard usage is lr0.1 : 64samples(mini-batch 8 * 8gpus).
thanks you for reminding. I have a question, that is, my batchsize to learning rate should be equal to 64 to 0.1 based linear scaling principle.
thanks you for reminding. I have a question, that is, my batchsize to learning rate should be equal to 64 to 0.1 based linear scaling principle.
Hi @longweiwei , if your total batchsize is 80, than the lr in config should be set to 80 / 64 * 0.1 = 0.125. Try it out.
@SuX97 Roger that. thanks.
I accordingly increased the lr based your default lr config, e.g. 0.1 for 64 videos per iteration. the performance in kinetics val dataset has imporved partly. but overfit is still exist. The best result on the validation set is only about 25% train config trian log
Any suggestions! thks.
That's weird. Could you please print out the tensor of each stage and major layers(stem, res-stages, pools, fc). I suspect that maybe some change in the BaseClass caused this issue; And check your kinetics dataset for whether the #videos matches.
@SuX97 thanks for your reminder. It turns out that my local video is inconsistent with the official one. I finally know the reason after being confused for so long.
yeah data is to blame
Notice
There are several common situations in the reimplementation issues as below
There are several things to do for different cases as below.
Checklist
Describe the issue
According to the configuration file of the project, when training r2d from scratch, severe overfitting occurred
Reproduction
A placeholder for the config.
the only changed is data format changed from picture to video
kinetics
sys.platform: linux Python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: Tesla V100-DGXS-32GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.168 GCC: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 PyTorch: 1.5.1+cu101 PyTorch compiling details: PyTorch built with:
TorchVision: 0.6.1+cu101 OpenCV: 4.4.0 MMCV: 1.1.5 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMAction2: 0.7.0+94895ec