kyuyeonpooh / objects-that-sound

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
BSD 3-Clause "New" or "Revised" License
32 stars 4 forks source link

Detailed preprocess dataset and format about AVE-Dataset #7

Open JackHenry1992 opened 3 years ago

JackHenry1992 commented 3 years ago

Thanks for sharing your great job. Can you provide the detailed process of preprocessing AVE-dataset?

JackHenry1992 commented 3 years ago

I have processed the AVE-Dataset using the preprocess.py, and generate trainset. But the loss did not decrease during the training phase. Loss epoch: 32, step: 79, train_loss: 0.8976, train_acc: 0.4969, lr:0.000010

kyuyeonpooh commented 3 years ago

Hi,

Thank you for your interest in my code and project.

Data preprocessing

In my case, I first directly downloaded videos from YouTube using youtube_dl, and saved each video into [YouTube ID of video].mp4

With the above naming convention, when you configure some path settings into config.ini file and then run preprocess.py:

For more details, you can refer to utils/extractor.py. You can also change some settings by changing some parameters in methods of Extractor class.


Loss not decreasing

I also faced this issue. This issue seems to occur because the last fully connected layer is so tiny and so vulnerable to noisy data, compared to other layers. Once the last fully connected layer is misguided, it may never be recovered to the expected state.

Here are some several tips that might help you. However, please remind that the network is not always successfully trained even though you apply all the solutions below.

1. Learning Rate I found using learning rate less or equal than 5e-5 was helpful for successful training. Using learning rate bigger than 1e-4 highly tends to be failed.

2. Use Larger Batch Using larger size of batch seems to be usually helpful for training, as data in AudioSet is quite noisy. In my case, I use 64 as the batch size.

3. In case of training AVE-Net: Tweak the parameter of the last fully connected layer As you can see in models/avenet.py, there are only 4 parameters in self.fc3 in AVE-Net. As this tiny network is very vulnerable to the gradient, I initialized this with fixed value to let it be more robust to the noisy data.

Please change this part like given below. https://github.com/kyuyeonpooh/objects-that-sound/blob/d19f971021a9219aa0987dadeaf7942ec7e4f31a/model/avenet.py#L24-L25

self.fc3 = nn.Linear(1, 2)
self.fc3.weight.data[0] = -0.7
self.fc3.weight.data[1] = 0.7
self.fc3.bias.data[0] = 1.2
self.fc3.bias.data[1] = -1.2

4. One more tip In my case, when I saw the loss decreases below to 0.69, the training had gone successfully.


Comment: Pretrained model is available! Please use them if you need.

If you have any questions or have any more issues, feel free to contact me. You can also leave issues in the repository. I can immediately check.

Sincerely, Kyuyeon.