Open Rick-Xu315 opened 2 years ago
Thank you for your interest in our work! Regarding your questions:
def freq_masking(self, img, freq_factor=1.0, mask_len=15):
factor = np.random.RandomState().rand()
freq_len = img.shape[0]
if factor <= freq_factor:
start = np.random.randint(0, freq_len - mask_len)
interval = np.random.randint(0, mask_len)
img[start : start + interval, :] = 0
return img
def time_masking(self, img, time_factor=1.0, mask_len=15):
factor = np.random.RandomState().rand()
time_len = img.shape[1]
if factor <= time_factor:
start = np.random.randint(0, time_len - mask_len)
interval = np.random.randint(0, mask_len)
img[:, start : start + interval] = 0
return img
Thanks for your reply! I would also like to ask another question:
In your paper you mention in table 3
that you get a lower performance of audio resnet 18 after finetuning on video-audio. I find similar result after we finetune the concat-based av model composed of pretrained unimodels and linear probe the audio backbone. I would like to know your opinions why the audio backbone gets worse after finetuning. Great thanks!
Hi, thanks for your work on AV FGC task. I'd like to inquire about some experiment details in your paper: