stanford-futuredata / noscope

Accelerating network inference over video
http://dawn.cs.stanford.edu/2017/06/22/noscope/
436 stars 122 forks source link

How to prepare CSV with ground truth labels for new video #38

Closed janhuang6 closed 6 years ago

janhuang6 commented 6 years ago

I noticed that NoScope needs CSV with ground truth labels along with the video as input. Two questions:

  1. Is the ground truth for the entire video or just for the beginning part of the video? If it is only for partial video, what is that partial, for example, the first half of the video or first 20% of the video (in time) or a percentage like that? That will help me to prepare my CSV.

  2. If the ground truth is for the entire video, does NoScope really use all the ground truth? If so, what is the purpose of processing if all the ground truth is known as input?

Thanks! Jan

sxhexe commented 6 years ago

Judging from section 6.1 in their paper, I think they only need ground truth for part of the video.

janhuang6 commented 6 years ago

NoScope needs ground truth for entire video to work the best. It needs ground truth for the training part. Then it still needs ground truth to calculate the actual error rate (or accuracy) of the test part.

sxhexe commented 6 years ago

@janhuang6 I dug into the code and here's what I think (take the coral video as an example): Frame 648000 to 648000+1188000 are used to train individual CNNs, which needs ground truth. Frame 648000+1188000 to 648000+1188000x2 are used to find the best configuration for CNN/DD combinations, which also needs ground truth for error rate calculations. Frame 648000+1188000x2 to 648000+1188000x3 are used to do the actual inference which technically does not need ground truth, but for benchmark purposes they still need ground truth to evaluate this method. @ddkang Am I right?

janhuang6 commented 6 years ago

Thank you for the input. This issue can be closed.