miha-skalic / youtube8mchallenge

1st place solution to Kaggle's 2018 YouTube-8M Video Understanding Challenge
Apache License 2.0
199 stars 54 forks source link

Inference: only images without audio #6

Closed Oktai15 closed 5 years ago

Oktai15 commented 5 years ago

Hello, @miha-skalic, great work!

Can I use your model without audio features? For example, I want to test your model on my video, I don't have feature extractor for audio (because it was not published), do I have ability to try your model? If yes, so how can I?

miha-skalic commented 5 years ago

Hi @Oktai15 ,

Unfortunately, Google has (not yet) release the audio feature extraction part. I'm guessing that one could use a vector of zeros for the audio features. Note that we have not tested this and thus cannot tell anything-thing about the impact on the performance.

Oktai15 commented 5 years ago

Thank you, @miha-skalic!

ideaRunner commented 5 years ago

About the audio feature extraction, I found this code https://github.com/tensorflow/models/tree/master/research/audioset#output-embeddings Also a person who have used it and get a good result. https://github.com/antoine77340/Youtube-8M-WILLOW/issues/28

The released AudioSet embeddings were postprocessed before release by applying a PCA transformation (which performs both PCA and whitening) as well as quantization to 8 bits per embedding element. This was done to be compatible with the YouTube-8M project which has released visual and audio embeddings for millions of YouTube videos in the same PCA/whitened/quantized format. We provide a Python implementation of the postprocessing which can be applied to batches of embeddings produced by VGGish. vggish_inference_demo.py shows how the postprocessor can be run after inference. If you don't need to use the released embeddings or YouTube-8M, then you could skip postprocessing and use raw embeddings.