microsoft / SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
https://arxiv.org/abs/2111.13196
MIT License
238 stars 34 forks source link

When will you release the tutorial for frame-based TSV generation? Thanks! #6

Open yaolinli opened 2 years ago

yaolinli commented 2 years ago

Besides, how can we prepare the data files like .label.tsv / .caption.tsv / *.caption.linelist.tsv to train SwinBert on our own dataset? Thank you very much ~

kevinlin311tw commented 2 years ago

Noted. Will prepare the suggested one as well.

yaolinli commented 2 years ago

Noted. Will prepare the suggested one as well.

I have found the related preprocessing codes in the ./prepro and can get the correct data files by modifying them. Thanks!

coranholmes commented 2 years ago

Noted. Will prepare the suggested one as well.

I have found the related preprocessing codes in the ./prepro and can get the correct data files by modifying them. Thanks!

Hi, would you please provide some insights on how you get the correct data files?

yaolinli commented 2 years ago

Take training set as an example, the following files should be prepared:

I refer to the ./prepro/create_image_frame_tsv.py to create the train_32frames.img.tsv and the caption-related files can be prepared imitating the file format in MSRVTT dataset. Besides, we also should generate the corresponding *.lineidx for the above tsv files, it can be generated by:

def generate_lineidx_file(filein, idxout):
    idxout_tmp = idxout + '.tmp'
    with open(filein, 'r') as tsvin, open(idxout_tmp,'w') as tsvout:
        fsize = os.fstat(tsvin.fileno()).st_size
        fpos = 0
        while fpos!=fsize:
            tsvout.write(str(fpos)+"\n")
            tsvin.readline()
            fpos = tsvin.tell()
    os.rename(idxout_tmp, idxout)
liyaowei-stu commented 2 years ago

I follow “/prepro/extract youcook2 frms.sh" executes "./prepro/extract_ frames.py", but it doesn't seem to work, and the following results are obtained:

python ./prepro/extract_frames.py \ --video_root_dir ./datasets/MSRVTT-v2/videos \ --save_dir ./datasets/MSRVTT-v2/ \ --video_info_tsv ./datasets/MSRVTT-v2/val.img.tsv \ --num_frames 32 \ 0it [00:00, ?it/s]`

Is my operation incorrect? Thank you very much ~

tiesanguaixia commented 2 years ago

I follow “/prepro/extract youcook2 frms.sh" executes "./prepro/extract_ frames.py", but it doesn't seem to work, and the following results are obtained:

python ./prepro/extract_frames.py --video_root_dir ./datasets/MSRVTT-v2/videos --save_dir ./datasets/MSRVTT-v2/ --video_info_tsv ./datasets/MSRVTT-v2/val.img.tsv --num_frames 32 0it [00:00, ?it/s]`

Is my operation incorrect? Thank you very much ~

Hi, I see there are several dockers of different tags in [https://hub.docker.com/r/linjieli222/videocap_torch1.7/tags](), could you please tell me which one should I choose? Thanks a lot!

liyaowei-stu commented 2 years ago

I follow “/prepro/extract youcook2 frms.sh" executes "./prepro/extract_ frames.py", but it doesn't seem to work, and the following results are obtained:

python ./prepro/extract_frames.py --video_root_dir ./datasets/MSRVTT-v2/videos --save_dir ./datasets/MSRVTT-v2/ --video_info_tsv ./datasets/MSRVTT-v2/val.img.tsv --num_frames 32 0it [00:00, ?it/s]`

Is my operation incorrect? Thank you very much ~

Hi, I see there are several dockers of different tags in https://hub.docker.com/r/linjieli222/videocap_torch1.7/tags, could you please tell me which one should I choose? Thanks a lot!

The image "fairscale" is my choice.

tiesanguaixia commented 1 year ago

Hi! Excuse me, could you please tell me how you get the _train32frames.img.tsv? I prepared the annotations with bash scripts/download_annotations.sh But when I run the code, it says that: No such file or directory: 'datasets/MSRVTT-v2/frame_tsv/train_32frames.img.tsv' I don't know why the _train32frames.img.tsv is not included in the annotations zip file of MSRVTT. Thank you!

tiesanguaixia commented 1 year ago

I follow “/prepro/extract youcook2 frms.sh" executes "./prepro/extract_ frames.py", but it doesn't seem to work, and the following results are obtained:

python ./prepro/extract_frames.py --video_root_dir ./datasets/MSRVTT-v2/videos --save_dir ./datasets/MSRVTT-v2/ --video_info_tsv ./datasets/MSRVTT-v2/val.img.tsv --num_frames 32 0it [00:00, ?it/s]`

Is my operation incorrect? Thank you very much ~

Hi, I see there are several dockers of different tags in https://hub.docker.com/r/linjieli222/videocap_torch1.7/tags, could you please tell me which one should I choose? Thanks a lot!

The image "fairscale" is my choice.

Hi! Did you run the code successfully? Excuse me, could you please tell me how you get the _train32frames.img.tsv? I prepared the annotations with: bash scripts/download_annotations.sh But when I run the code, it says that: No such file or directory: 'datasets/MSRVTT-v2/frame_tsv/train_32frames.img.tsv' I don't know why the _train32frames.img.tsv is not included in the annotations zip file of MSRVTT. Thank you!

SuleBai commented 1 year ago

I follow “/prepro/extract youcook2 frms.sh" executes "./prepro/extract_ frames.py", but it doesn't seem to work, and the following results are obtained:

python ./prepro/extract_frames.py --video_root_dir ./datasets/MSRVTT-v2/videos --save_dir ./datasets/MSRVTT-v2/ --video_info_tsv ./datasets/MSRVTT-v2/val.img.tsv --num_frames 32 0it [00:00, ?it/s]`

Is my operation incorrect? Thank you very much ~

Hi, I see there are several dockers of different tags in hub.docker.com/r/linjieli222/videocap_torch1.7/tags, could you please tell me which one should I choose? Thanks a lot!

The image "fairscale" is my choice.

Hi! Did you run the code successfully? Excuse me, could you please tell me how you get the _train32frames.img.tsv? I prepared the annotations with: bash scripts/download_annotations.sh But when I run the code, it says that: No such file or directory: 'datasets/MSRVTT-v2/frame_tsv/train_32frames.img.tsv' I don't know why the _train32frames.img.tsv is not included in the annotations zip file of MSRVTT. Thank you!

Hi! Have you solved this problem? I have encountered the same. Now I think maybe should download the dataset and run create_image_frame_tsv.py? Could you give me some advice? Thanks a lot.

bathlarachit commented 1 year ago

Hi, I am getting this file not found error "datasets/MSRVTT-v2/frame_tsv/val_128frames_img_size256.img.tsv" while running evaluation can anyone help me out regarding how to generate required tsv file it would be really helpful, I went through repo instruction but this file are not generated it seems.

Some advice would be really helpful.

Thanks a lot

pixelieee commented 7 months ago

I follow “/prepro/extract youcook2 frms.sh" executes "./prepro/extract_ frames.py", but it doesn't seem to work, and the following results are obtained:

python ./prepro/extract_frames.py --video_root_dir ./datasets/MSRVTT-v2/videos --save_dir ./datasets/MSRVTT-v2/ --video_info_tsv ./datasets/MSRVTT-v2/val.img.tsv --num_frames 32 0it [00:00, ?it/s]`

Is my operation incorrect? Thank you very much ~

Hi,I meet the same problem.I have solved it by annotating the code in line 124

raw_video_info = load_tsv_to_mem(video_info_tsv)
videoFiles = []
for _, line_item in enumerate(raw_video_info):
    input_file = line_item[0]
    #####input_file = input_file.replace('datasets','_datasets') 
    if os.path.isfile(input_file):
        videoFiles.append(input_file)