westlake-repl / MicroLens

A Large Short-video Recommendation Dataset with Raw Text/Audio/Image/Videos (Talk Invited by DeepMind).
136 stars 8 forks source link

[Issue] Problems Encountered Running `run_video.py` #2

Closed yeahjack closed 5 months ago

yeahjack commented 6 months ago

Description

I have successfully downloaded the MicroLens dataset and am currently running the code provided by the authors. However, I've encountered several issues during the execution.

Issues

  1. Clarification Needed on Dataset Selection In the file MicroLens/Code/VideoRec/SASRec/run_video.py, there is a line:

    max_video_no = 91717 # 34321 for 10wu, 91717 for 100wu

    I am unsure what "10wu" and "100wu" refer to. Which setting should I use for the MicroLens-100k dataset?

    1. Error Using x3d-s Video Model When running the code with the x3d-s video model, strictly following the parameters specified in the paper, the following traceback error occurs:
      RuntimeError: input image (T: 5 H: 7 W: 7) smaller than kernel size (kT: 13 kH: 5 kW: 5)

    The error points to an issue with the pooling operation in the video model. It seems the input image dimensions are too small for the kernel size. How should I adjust the kernel size or the input dimensions?

    1. Instability in Metrics When Using video-mae-base Model

    Using the video-mae-base as the video model, I observed that the learning metrics Hit10 and nDCG10 fluctuate significantly during training, with a trend of occasionally approaching zero. What might be causing this instability, and how can it be resolved?

Request for Assistance

I would appreciate any insights or recommendations on addressing these issues, especially with the right dataset settings for MicroLens-100k, the handling of input dimensions for the x3d-s model, and strategies for stabilizing training metrics with the video-mae-base model.

Thank you very much!

microlens2023 commented 5 months ago

Hi @yeahjack , sorry for the delayed response.

For Q1, max_video is the video number of the deployed dataset, it's 19738 for MicroLens-100K and 19220 for MicroLens-50K.

For Q2, the error occurs because you run the code based on the default video library. Note that in recommendation tasks, the headers of some video encoders are modified for the input of only 5 frames. You should modify some lines of the pytorchvideo library, the changes have been recorded in show_parameters.py. For the problem you encountered, you can change lines 541, 512, and 715-719 by following pytorchvideo/models/x3d.py (The hint is also recorded on line 96 of show_parameters.py).

For Q3, the batch size and the GPU you assigned may raise such a problem.

Please add me through email if necessary, I will do my best to help you.

Gargantua43 commented 4 months ago

Hi author, can you give the exact steps to modify Q2? The comments in pytorchvideo/models/x3d.py that you mentioned are not understood. In particular, the change to line 512

Gargantua43 commented 4 months ago

下图是针对541行的修改结果 图片 下图是针对(715-719)的修改结果 图片 而512行不知道如何修改;还希望您给出具体修改方式,十分感激您的回复! 图片

yxni98 commented 4 months ago

你好 @Gargantua43 , 我正在看提出的问题。如果你运行的时候出现了错误,很有可能你加载的pytorchvideo库是默认的而不是我所修改的(因为推荐系统和CV任务的差别,有些headers或者input我进行了修改以适配只输入5帧的情况)。如果你已经找到默认的pytorchvideo库,你可以直接用github上传的对应文件,比如x3d.py,替换默认库中对应位置的文件即可。这是最直接的。或者找到对应的默认库,直接用我修改的文件夹把整个库替换掉。

如果有什么问题请@我,这样我会有邮件通知。如果还不能解决,请邮件提供联系方式,我会尽力帮忙解决的。

Gargantua43 commented 4 months ago

十分感谢您的回复;您的意思是:用您刚刚给我指明的x3d.py,替换掉默认pytorchvideo库中的x3d.py?我的理解没有问题吧 @yxni98

yxni98 commented 4 months ago

@Gargantua43 ,嗯嗯是的,你试下,不work再找我

Gargantua43 commented 4 months ago

@yxni98 十分感谢您的回复,Video部分的程序调试成功了;该数据集对我们的工作有很大的帮助,十分感谢您的工作!