open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.14k stars 1.22k forks source link

Weird lines in the generated kinetics400 raw frames file list #847

Open richardkxu opened 3 years ago

richardkxu commented 3 years ago

Describe the bug

Hi, thanks for making this great repo! I have encountered some weird lines that are different from most raw frame file list in both kinetics400_train_list_rawframes.txt and kinetics400_val_list_rawframes.txt. As shown below, most lines are normal but there are around 1400+ lines out of 27000+ lines that shows the absolute video path instead of relative path and only 1 number instead of 2 after the video path:

abseiling/51GJ2uVXjvM_001007_001017 250 0
writing/Uci8kXgdGS0_000171_000181 300 396
data/kinetics400/rawframes_train/skateboarding/KZCcukx7y4I_000148_000158 306         <--- weird line
snowkiting/Jumj2_GIOKg_000130_000140 108 323
bartending/_6EheA5xKXo_000110_000120 300 15
capoeira/TqB-f0uv3jA_000014_000024 73 43
snorkeling/Nc1EV8thOUU_000667_000677 300 321
snorkeling/CK995jol6nI_000004_000014 300 321
smoking/KJRmXFjM3nA_000032_000042 300 316
skateboarding/-MqXp8AGJw4_000072_000082 47 306
trapezing/BnV32wRDlG8_000016_000026 300 364
breakdancing/_XsPsy6U8Ps_000032_000042 200 34
data/kinetics400/rawframes_train/capoeira/H6pT32OzxFU_000029_000039 43                <--- weird line
lunge/NvP3E0zqWIM_000002_000012 300 183
abseiling/LOeu4FXBId0_000002_000012 155 0

When I run test of pretrained irCSN152 models on kinetics400, I got the following error:

Traceback (most recent call last):
  File "/home/richardkxu/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/home/richardkxu/Documents/mmaction2/mmaction/datasets/rawframe_dataset.py", line 115, in __init__
    dynamic_length=dynamic_length)
  File "/home/richardkxu/Documents/mmaction2/mmaction/datasets/base.py", line 88, in __init__
    self.video_infos = self.load_annotations()
  File "/home/richardkxu/Documents/mmaction2/mmaction/datasets/rawframe_dataset.py", line 146, in load_annotations
    assert label, f'missing label in line: {line}'
AssertionError: missing label in line: data/kinetics400/rawframes_val/dining/ze3RetmQixU_000122_000132 91

Reproduction

I have download all the data in mmaction2/data/kinetics400 which is a symbolic link to my data disk /data/richardkxu/mmaction2/kinetics400. I follow the download and frame extraction steps in kinetics preparation.

I generated the raw frame file list using:

 bash generate_rawframes_filelist.sh kinetics400

Potential causes

  1. should I place my data at mmaction2/tools/data/kinetics400 instead of mmaction2/data/kinetics400? But I use mmaction2/data/ucf101 to download ucf data and it works.
  2. are the labels to those weird lines missing? I know some youtube videos can get deleted, but annotations should always be a superset of videos right? and kinetics400_train_list_videos.txt seems to be correct so I assume the annotations are not missing?
dreamerlin commented 3 years ago

Maybe you can try this annotation files: https://github.com/open-mmlab/mmaction2/blob/master/tools/data/kinetics/download_backup_annotations.sh, which contains more data list.

richardkxu commented 3 years ago

Hi, I have tried with the above annotation files but got the same error when extracting the frames. I feel like there maybe some bugs in the enerate_rawframes_filelist.sh script? Besides the softlink data -> /data/richardkxu/mmdatasets under the mmaction repo directory, any other soft links required during the frame extraction? Thanks!

richardkxu commented 3 years ago

Hi, the missing label still persists for the kinetics dataset for the lastest mmaction2 release 0.17.0. No matter what model I use, as long as it uses kinetics400 dataset, I got the following error: ''' Traceback (most recent call last): File "/home/richardkxu/Documents/mmaction2-private/tools/train.py", line 199, in main() File "/home/richardkxu/Documents/mmaction2-private/tools/train.py", line 167, in main datasets = [build_dataset(cfg.data.train)] File "/home/richardkxu/Documents/mmaction2-private/mmaction/datasets/builder.py", line 39, in build_dataset dataset = build_from_cfg(cfg, DATASETS, default_args) File "/home/richardkxu/anaconda3/envs/mmaction2/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') AssertionError: RawframeDataset: missing label in line: data/kinetics400/rawframes_train/drinking/JqAk2SDUbGA_000166_000176 100 '''

Using the backup annotation files does not help either. Any suggestions?