Very good article! My question is: mentioned in the paper "Subsequently, the trained video retriever is employed to sample negative videos as hard negatives for Shared-Norm to train the moment localizer." in section 3.5 Is this negative sampling based on the results of one stage? If so, do you random sample like CONQUER?
Very good article! My question is: mentioned in the paper "Subsequently, the trained video retriever is employed to sample negative videos as hard negatives for Shared-Norm to train the moment localizer." in section 3.5 Is this negative sampling based on the results of one stage? If so, do you random sample like CONQUER?