Closed Oktai15 closed 5 years ago
Hi @Oktai15 ,
Unfortunately, Google has (not yet) release the audio feature extraction part. I'm guessing that one could use a vector of zeros for the audio features. Note that we have not tested this and thus cannot tell anything-thing about the impact on the performance.
Thank you, @miha-skalic!
About the audio feature extraction, I found this code https://github.com/tensorflow/models/tree/master/research/audioset#output-embeddings Also a person who have used it and get a good result. https://github.com/antoine77340/Youtube-8M-WILLOW/issues/28
The released AudioSet embeddings were postprocessed before release by applying a PCA transformation (which performs both PCA and whitening) as well as quantization to 8 bits per embedding element. This was done to be compatible with the YouTube-8M project which has released visual and audio embeddings for millions of YouTube videos in the same PCA/whitened/quantized format. We provide a Python implementation of the postprocessing which can be applied to batches of embeddings produced by VGGish. vggish_inference_demo.py shows how the postprocessor can be run after inference. If you don't need to use the released embeddings or YouTube-8M, then you could skip postprocessing and use raw embeddings.
Hello, @miha-skalic, great work!
Can I use your model without audio features? For example, I want to test your model on my video, I don't have feature extractor for audio (because it was not published), do I have ability to try your model? If yes, so how can I?