Open JackHenry1992 opened 3 years ago
I have processed the AVE-Dataset using the preprocess.py, and generate trainset. But the loss did not decrease during the training phase.
Loss epoch: 32, step: 79, train_loss: 0.8976, train_acc: 0.4969, lr:0.000010
Hi,
Thank you for your interest in my code and project.
In my case, I first directly downloaded videos from YouTube using youtube_dl
, and saved each video into [YouTube ID of video].mp4
With the above naming convention, when you configure some path settings into config.ini file and then run preprocess.py
:
[YouTube ID of video].npz
files, each file including 10 frames from a single video (extracted at 1 fps). The frames are resized to 256x256 in default settings.[YouTube ID of video].npz
files, each file including 10 1-second length of spectrograms from a single video.
The reason why I extracted 10 samples is because video clips in AudioSet are 10-second length.For more details, you can refer to utils/extractor.py
.
You can also change some settings by changing some parameters in methods of Extractor
class.
I also faced this issue. This issue seems to occur because the last fully connected layer is so tiny and so vulnerable to noisy data, compared to other layers. Once the last fully connected layer is misguided, it may never be recovered to the expected state.
Here are some several tips that might help you. However, please remind that the network is not always successfully trained even though you apply all the solutions below.
1. Learning Rate
I found using learning rate less or equal than 5e-5
was helpful for successful training.
Using learning rate bigger than 1e-4
highly tends to be failed.
2. Use Larger Batch Using larger size of batch seems to be usually helpful for training, as data in AudioSet is quite noisy. In my case, I use 64 as the batch size.
3. In case of training AVE-Net: Tweak the parameter of the last fully connected layer
As you can see in models/avenet.py
, there are only 4 parameters in self.fc3
in AVE-Net.
As this tiny network is very vulnerable to the gradient, I initialized this with fixed value to let it be more robust to the noisy data.
Please change this part like given below. https://github.com/kyuyeonpooh/objects-that-sound/blob/d19f971021a9219aa0987dadeaf7942ec7e4f31a/model/avenet.py#L24-L25
self.fc3 = nn.Linear(1, 2)
self.fc3.weight.data[0] = -0.7
self.fc3.weight.data[1] = 0.7
self.fc3.bias.data[0] = 1.2
self.fc3.bias.data[1] = -1.2
4. One more tip In my case, when I saw the loss decreases below to 0.69, the training had gone successfully.
Comment: Pretrained model is available! Please use them if you need.
If you have any questions or have any more issues, feel free to contact me. You can also leave issues in the repository. I can immediately check.
Sincerely, Kyuyeon.
Thanks for sharing your great job. Can you provide the detailed process of preprocessing AVE-dataset?