Closed SalocinB closed 2 years ago
@SalocinB Hi, a similar issue is at https://github.com/ttengwang/dense-video-captioning-pytorch/issues/2. Also, I notice the average event number of your dataset is 30 times more than ActivityNet Captions (100 v.s. 3.6), you may retrain the model to fit the custom data.
Hey, I would like to modify your code for my own dataset with already generated custom captions. The clips show every time one Person doing one out of nine actions. The captions describe the gender, the action and the cloth, the person is wearing. There are 100 different captions for each Video. They are stored as strings in an extra json-file. The Videos are till now framewise png's with json's describing the visible clothes as well as the gender of the Person in each frame.
Can you help me, how I should prepare my dataset and your code, that I can use it as input for your video-captioning-net?
Thank you in advance.
Hi, I also want to organize my own data set. Can you share the specific steps you took to organize your dataset? Or whether there is annotation software available?
Hey, I would like to modify your code for my own dataset with already generated custom captions. The clips show every time one Person doing one out of nine actions. The captions describe the gender, the action and the cloth, the person is wearing. There are 100 different captions for each Video. They are stored as strings in an extra json-file. The Videos are till now framewise png's with json's describing the visible clothes as well as the gender of the Person in each frame.
Can you help me, how I should prepare my dataset and your code, that I can use it as input for your video-captioning-net?
Thank you in advance.