Running the code for videos

I am doing violence detection using video captioning. If I give your model a number of videos containing some type of violence will it be able to tell that in captions?. Example if a tree is on fire in a video or if a roberry is taking place in a video then will your model be able to tell using captions that 'A tree is on fire' and 'A roberry/armed roberry is taking place. I don't have captions for videos. I only have videos and images without caption so I was hoping to generate training and testing data with the help of your model and then make my own video captioning model.

simon-ging / coot-videotext

Running the code for videos #55