About NLQ results. - Githubissues

showlab / EgoVLP

[NeurIPS2022] Egocentric Video-Language Pretraining

https://arxiv.org/pdf/2206.01670.pdf

222 stars 19 forks source link

About NLQ results. #3

Closed takfate closed 1 year ago

takfate commented 2 years ago

Hello. Thanks for such nice work! Now, we have some questions and want your help. We use your EgoVLP_PT_BEST checkpoint to extract the video feature. We train VSLNet with the feature and the bert checkpoint from EgoVLP_PT_BEST . It Can't seem to get the precision you have in the report, and we only get about 7~8 R1@0.3.

QinghongLin commented 2 years ago

Thanks,

We get similar results ~8 R1@0.3 based on the default settings, and we further boost the performance based on some parameter tuning (e.g., learning rate, batch size).

I attached our config.json and log of best results in here model.zip, hope it helps you reproduce the results.

Please reach out if you have new updates.

takfate commented 2 years ago

Thanks for your response. for feature extraction, does the model contain video proj (video_dim->256) and text proj(text_dim->256). Are the channels of video feature and text feature 256?

QinghongLin commented 2 years ago

Yes, during the feature extraction, the model contains video_proj and text_proj, and the channels of video and text features are 256.

takfate commented 2 years ago

Is the args.token True when extracting text feature? We find the extracted text feature by default is 1x256.

takfate commented 2 years ago

Is the args.token True when extracting text feature? We find the extracted text feature by default is 1x256.

In our experiments, it seems that using Lx256 and using 1x256 have similar performance. But they are both weaker than using Lx768. Using Lx768 can obtain performance similar to your results, but still, have about a 0.4 gap.

QinghongLin commented 2 years ago

@takfate

Hi, the NLQ results are implemented by my collaborator Mattia, I may misalign some details, I attach our VSLNet code implementation here so that you can refer to the relevant details.

NLQ.zip