technosaby / RedHenAudioTagger

MIT License
1 stars 4 forks source link

Create a set of sample test cases based on Red Hen's data for baselining #3

Closed technosaby closed 2 years ago

technosaby commented 2 years ago
  1. Take a set of 10 videos
  2. Extract the audio and then use Yamnet to get the taggings
  3. Process the tagging and create output based on Redhen's format
technosaby commented 2 years ago

@brucearctor @turnermarkb As discussed, I selected some Redhen videos to run through the model. I see the following data. The first video did not give much variations, while the second video have a good amount of tagging for different classes.

2010-01-01_2335_US_CSPAN2_World_War_II 2021-01-01_0300_US_WUAB_19_News_at_10PM 2021-01-01_2200_US_FOX-News_The_Five

I am also generating the output in this form as of now (not Redhen format). I am working on that to generate in RedHens format.

Screenshot 2022-06-29 at 10 26 01 PM

What next steps should I prioritise?

brucearctor commented 2 years ago

I generally like to ensure I can get the minimum requirements met ... So, while potentially easy, you might want to start to arrive at the file output format -- coming up with a proposed filetype/format will be good to allow time for others to review/comment, and any revisions. Seems worth doing with plenty of time, since is something that must eventually get done.

technosaby commented 2 years ago

For now, I am have generated an .sfx file and generated output in this format. Screenshot 2022-06-30 at 1 59 00 PM

I will review this with @turnermarkb , Prof Stenn, Prof Urhig in today's meeting. It will be great if @brucearctor you can join.