Performance check - Githubissues

tsujuifu / pytorch_violet

A PyTorch implementation of VIOLET

136 stars 6 forks source link

Performance check #7

Closed Flowerfan closed 1 year ago

Flowerfan commented 2 years ago

Hi, thank you for sharing the code and models.

I have used the ckpt_violet_pretrain.pt and ckpt_violet_msrvtt-retrieval with our data processing (5 frames with interval num_frames // 5) for msrvtt t2v retrieval evaluation. I got rank@1 22.6/32.9 which is lower than the number (25.9/34.7) in the paper. I also tested the CLIP model and got a similar result. Are the released models achieving the reported results? If yes, could you provide the processing pipeline or describe how to get the reported performance? Thank you!

tsujuifu commented 2 years ago

Yes, we equally sample 5 frames for each video using extract_video-frame.

I have re-tested and got 25.9/49.8 from ckpt_violet_pretrain.pt and 34.3/62.9 from ckpt_violet_msrvtt-retrieval.pt.

I am using PyTorch 1.7.0 and transformers 4.18.0 with CUDA 11.0. Also, do not forget to add model.eval() during the evaluation.

Flowerfan commented 2 years ago

Thank you for the re-testining. Could you provide me with the txt_msrvtt.json file that contains the 1k test videos? There are only 50 videos in https://github.com/tsujuifu/pytorch_violet/blob/main/_data/txt_msrvtt-retrieval.json

Flowerfan commented 2 years ago

I just tested with my txt file, and got 'r@1': 0.233, 'r@5': 0.533 with the ckpt_violet_pretrain.pt. This is my generated txt file.

tsujuifu commented 2 years ago

The files in this repo are parital examples to help formulate the input data.

Here is my txt_msrvtt-retrieval.json I have checked it, and it seems to be the same 😊.

Flowerfan commented 2 years ago

I just tested your file with the ckpt_violet_pretrain.pt using your repo, but still got r@1': 0.233, 'r@5': 0.533 :joy: . Have no idea what's wrong

Flowerfan commented 2 years ago

Hi, Just wondering how you process the Youcook2 dataset for evaluation since one video contains multiple clip-text pairs. I have extracted clip-text pairs (3400) for evaluation and got a very disappointing performance.

siyangssy commented 2 years ago

I just tested your file with the ckpt_violet_pretrain.pt using your repo, but still got r@1': 0.233, 'r@5': 0.533 😂 . Have no idea what's wrong

I just get the same result with u e.g. r@1': 0.233, 'r@5': 0.533. Have u solve the problem?