Closed Scitator closed 6 years ago
Hi, I think your environment is correct and the provided implementation should work properly.
I noticed that you have training accuracies like 12.5. Are you training with a single GPU or have you reduced the Batchsize?
@yjxiong yup, I use just single Titan V with ~20 (memory restriction)
@yjxiong what are hardware requirements for correct RGBDiff model training? 4x1080ti?
I don't think 20 batch size will work. The 128 batchsize is necessary for convergence. Any GPU setting that gives your it will be ok.
Second try with 2 Titan V (120 batch size with pretrained resnet18):
python main.py ucf101 RGBDiff \
./data/ucf101_rgb_train_split_1.txt ./data/ucf101_rgb_val_split_1.txt \
--arch resnet18 --num_segments 5 --gd 40 --lr 0.001 --lr_steps 80 160 \
--epochs 180 -b 128 -j 6 --dropout 0.8
Interesting. Can you first try training RGB or flow to see whether you can get the model converge? We haven’t met this problem before.
@yjxiong from my experiments, RGB converges to ~75-80% accuracy for 1 split.
First few epochs training with
python main.py ucf101 RGB \
./data/ucf101_rgb_train_split_1.txt ./data/ucf101_rgb_val_split_1.txt \
--arch resnet18 --num_segments 3 --gd 20 --lr 0.001 --lr_steps 30 60 \
--epochs 80 -b 128 -j 6 --dropout 0.8
gives
so, looks like RGB one works correctly.
I didn’t train resnet18 model so I don’t know whether your results are correct. If you run the BNinception model, the accuracy for RGB should quickly go to >80%.
So, the results
python main.py ucf101 RGB <data>
--arch BNInception --num_segments 3 \
--gd 20 --lr 0.001 --lr_steps 30 60 --epochs 80 \
-b 100 -j 8 --dropout 0.8 \
--snapshot_pref ucf101_bninception_
Max ~78% accuracy, no 80% yet.
Did you use the tool we provided for extracting frames? If so you should get >80% results for RGB.
yeap, I use https://github.com/yjxiong/temporal-segment-networks for video -> frames pipeline.
Still, what is the correct way to reproduce RGBDiff results?
The correct way, as I have mentioned, is to follow the instructions in README. Testing the RGB pipeline is to verify whether your pipeline is correct. Please first try to reproduce the RGB result by strictly following the provided command. Then we can help you with the RGBdiff.
So, just to be sure, let me describe my full process of reproducing TSN results
Data preparation:
tools/build_of.py
tools/build_file_list.py
Model preparation:
Are there any problems with data preparation?
PS. Will try to reproduce this pipeline on other machine.
Finally, I found an error during data preprocessing - somehow there is mistake with frames count per video. My hotfix (for dataset.py
):
def _parse_list(self):
self.video_list = []
for x in open(self.list_file):
filepath, _, label = x.strip().split(' ')
n_frames = len(os.listdir(filepath)) - 1
self.video_list.append(VideoRecord([filepath, n_frames,label]))
I see. When you are generating the file list, please make sure the number of frames for each video is correct.
@Scitator Were you able to get the training converging?
@yjxiong Finally, yes. ~87% accuracy with official code (both RGB and RGBDiff) and ~97% accuracy by my reimplementation with additional image augmentations and training tricks. Working on improvements still. :)
So, it converges.
Good to know. Just a heads up, on UCF101 it is not normal to get a 97% accuracy with RGB or RGBDiff w/o any pretraining. Maybe you should check whether you are using the average of per-class accuracy.
Hi,
I am trying to reproduce your results, nevertheless I found, that current official RGBDiff implementations does no learn: Python 3.6, Pytorch 0.3.1, cuda90.
The interesting part is, that the training loop looks correct, so....some model architecture bug?