ttengwang / dense-video-captioning-pytorch

Second-place solution to dense video captioning task in ActivityNet Challenge (CVPR 2020 workshop)
73 stars 23 forks source link

How to preprocess data for your network? #7

Closed SalocinB closed 2 years ago

SalocinB commented 3 years ago

Hey, I would like to modify your code for my own dataset with already generated custom captions. The clips show every time one Person doing one out of nine actions. The captions describe the gender, the action and the cloth, the person is wearing. There are 100 different captions for each Video. They are stored as strings in an extra json-file. The Videos are till now framewise png's with json's describing the visible clothes as well as the gender of the Person in each frame.

Can you help me, how I should prepare my dataset and your code, that I can use it as input for your video-captioning-net?

Thank you in advance.

ttengwang commented 3 years ago

@SalocinB Hi, a similar issue is at https://github.com/ttengwang/dense-video-captioning-pytorch/issues/2. Also, I notice the average event number of your dataset is 30 times more than ActivityNet Captions (100 v.s. 3.6), you may retrain the model to fit the custom data.

Wangdanchunbufuz commented 1 year ago

Hey, I would like to modify your code for my own dataset with already generated custom captions. The clips show every time one Person doing one out of nine actions. The captions describe the gender, the action and the cloth, the person is wearing. There are 100 different captions for each Video. They are stored as strings in an extra json-file. The Videos are till now framewise png's with json's describing the visible clothes as well as the gender of the Person in each frame.

Can you help me, how I should prepare my dataset and your code, that I can use it as input for your video-captioning-net?

Thank you in advance.

Hi, I also want to organize my own data set. Can you share the specific steps you took to organize your dataset? Or whether there is annotation software available?