Open olivertai opened 6 years ago
Due to the nature of transcodes on YouTube, the timestamp for a given frame may vary from one run to the next (in our experience by at most a few milliseconds). I would suggest something like assigning frames to annotations as long as the timestamp difference is less than say 8 milliseconds.
Thank you very much for the response, that seems working. Just one more question, It seems to me that one video is mapped with multiple txt files, is that right? Then how many videos in total are needed to match with the dataset? Can I have a rough number? Thank you.
Yes, there are in general multiple clips from each video, and each clip has a separate txt file. There are about 7200 videos that have at least one clip.
Thank you for the quick response.
Hi, it's me again. Sorry to bother. In fact, I tried to extract the frames by following your suggestion(frame&file timestamp difference less than 8 ms), it turns out that there are still 2000+ txt files in the training set who cannot find their corresponding frames. Is it possible to know your way of extracting frames?
Then I removed those 2k txt files and tried to continue training on the pre-trained model with the remaining files to validate the dataset, using parameters: batch_size=4, min_stride=2, max_stride=5, vgg loss, then I found through tensorboard that the total_loss fluctuates a lot between 10.0 and 40.0. There's something which wasn't done in the right way I think.
p.s. One more question about timestamp. Suppose that I have a 25 fps video, then the timestamp of the first frame should be 0(ns) or 40000(ns)? Looking forward to your response, thank you.
Hi, and sorry for the slow response.
I'm afraid I can't give more details about our method for extracting frames, beyond the fact that we used an internal system which is not generally available. However, perhaps I can help with the timestamp issues.
Firstly, I assume that you already discarded any cases where you were not able to get any frames for the video at all. (This could happen if, for example, a user removed their video from YouTube or a video was blocked.)
After that, perhaps we can figure out what's going on by comparing the list of timestamps you are getting and the list of timestamps in our data? For example, we could see if the frame-rate is different, or if the start time is wrong. Can you show an example of a cameras file for which you don't get the same timestamps, and also show what timestamps you are getting for that video?
I believe in our data the first frame of a video is considered to have timestamp 0. I haven't used cv2.VideoCapture myself, but in your example code above it looks like you are reading the timestamp after calling read(), which perhaps means that you are getting the timestamp of the next frame instead of the one you just read? (I think you could try instead getting the timestamp before calling read.)
@olivertai hello, Would you mind sharing the script you have used to download datasets that you have used for your training ?
Hi, I'm trying to retrieve frames from Youtube links on the first line of txt files. I downloaded videos and use opencv-python to read and match frame timestamps:
video = cv2.VideoCapture(path) video.set(cv2.CAP_PROP_POS_MSEC, 0) while video.isOpened(): frame_exists, curr_frame = video.read() video_timestamp = (int)(round(video.get(cv2.CAP_PROP_POS_MSEC)*1000))
Then I found that not all of the timestamps in txt files match with video timestamps. They sometimes have an error in the order of 100 microsecond or 1 millisecond. Am I doing something wrong? Looking forward to your help. Thank you!