piergiaj / super-events-cvpr18

Code for our CVPR 2018 paper "Learning Latent Super-Events to Detect Multiple Activities in Videos"
MIT License
123 stars 34 forks source link

process for training models (RCTA) #8

Open rhgmn opened 5 years ago

rhgmn commented 5 years ago

What was the training process for the current model in the RCE repo? What was the dataset and how many epochs did it train for? How much training did it take to get the results currently?

How were the thresholds determined? Why are some defined as < vs. >?

piergiaj commented 5 years ago

I'm not sure what the RCE repo is, though I am happy to help with running the code available in this repo.

Regarding the training details:

The learning rate was set to 0.1 and reduced by a factor of 10 every 1000 iterations. We trained the network using a batch size of 32 videos for 5000 iterations using the Adam optimizer with default parameters. We applied dropout with a probability of 0.5 to the input features.

This is about 20-30 epochs of training, which is sufficient assuming the pretrained features are strong (e.g., I3D features).

Regarding thresholds, we usually don't apply them for evaluation on standard datasets (e.g., MultiTHUMOS and Charades) since they evaluate on a per-frame metric. Thresholds are needed when extracting intervals with specific start/end times for activities. The exact setting of the thresholds is usually determined on part of the training set, finding numbers that provide good performance.

pengxiaoxiao commented 5 years ago

Hello, I'd like to know if you can provide the data input format of multiythumos or the characteristics you use in your work in super_event, and I look forward to your timely reply.