technosaby / RedHenAudioTagger

MIT License
1 stars 4 forks source link

Changes for the parser #26

Closed technosaby closed 2 years ago

peteruhrig commented 2 years ago

Can you provide an example of what the new output looks like in here?

technosaby commented 2 years ago

The file contains the following: 1) Header - Red Block, Copied from seg files of the video 2) Legend - Blue Block 3) Body - Green Block (for one frame)

Screenshot 2022-07-02 at 10 54 02 PM
turnermarkb commented 2 years ago

Hi, Saby, Just fyi: it’s better if you use mark.turner@case.edu.

You are making progress. This is a better top block. In the body, you now have the first three fields in good form. I have questions about the Body. Can you provide a codebook explanation for its structure? How would a user use the metadata? How would it be searched? Let’s start with using standard unix calls. Would it be useful to help design the body metadata format by thinking about pairing it with a bash script that a user could use at the command line to conduct searches? As an example, suppose a user wanted to find examples of a drum roll, or laughter, or a helicopter, or a drum roll and laughter within the same second, or a drum roll and a helicopter within the same 2 seconds. What would be the input command to the bash script, and what would be the output? The form of the input needs to be straightforward. Let’s call this bashscript ssfx for “search for sound effects.” Suppose it has flags --files for files to search; it would need to be able to accept all files in a directory, and recursively —effects effects —within interval in seconds within which range is satisfied

A user would command something like ssfx -r —files /tv/2022/2022-03/20220305 —effects{{drum|percussion}&{laughter}} —within 5

Then ssfx would egrep through the directory 20220305 recursively looking for cases where, within a 5 second interval, there is metadata for laughter AND (Either drum or percussion), and it would output its hits, including name of the file found effects time interval in which they are found. some way to play that interval (with perhaps variable buffer on either side) in some kind of player. The output should be in a csv file for easy handling, and for import to R, so the user could run statistics.

Don’t take any of this as a specific recommendation. Rather, think of providing an explanation of the body and a bash script (using input variables) that would permit easy searches and interpretable csv output that could then be used.

A next step would be to think about importing this metadata into ELAN. That way, users could take a file, run the parser on it, import the file and the ssfx output into ELAN, and, voilà, would have in ELAN a metadata field for the file that would include all the sound effects. ELAN would also allow the user to hand-tag sound effects, and the .eaf file could then be imported into the Red Hen metadata for the file.

This is all just a suggestion that you think about how to design the Body and the Search bash script such that users would get easy, great value. For the moment, concentrate on the command line, but keep in the back of your mind that most people interesting in using this thing work in ELAN. Study the Red Hen page on ELAN at https://sites.google.com/case.edu/techne-public-site/elan

@Austin, Francis, Peter, Ahmed: more and better guidance?

m

technosaby commented 2 years ago

@turnermarkb Sure I will start working on this. Is a python script okay or it has to be in bash?

Also about the cookbook, I was thinking of directly referencing the link of the classes OR from the YaMNet model. Or you prefer a custom cookbook? It will be great if you can share some format for the same. @brucearctor

turnermarkb commented 2 years ago

Sure.

On Jul 3, 2022, at 6:09 AM, Sabyasachi Ghosal @.***> wrote:

@turnermarkb https://github.com/turnermarkb Sure I will start working on this. Is a python script okay ?

— Reply to this email directly, view it on GitHub https://github.com/technosaby/gsoc2022/pull/26#issuecomment-1173051886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVWNTCWRSKPHA4S6E3TDVSFREFANCNFSM52PK4PJA. You are receiving this because you were mentioned.

brucearctor commented 2 years ago

Bash/Python: I don't think that matters. Do what's most comfortable. Bash is nice ... if bash can reasonably handle, as there is less a need for consideration of version/packaging/compatibility. But, I assume Python is plenty fine ( in general, and especially at this early stage ).

I also don't know that writing robust search is important in either language ... I would imagine there is alternate indexing and search that would happen in the production system, but I haven't looked at the indexing system in years. It is worth thinking about the types of searches that users might want to do, and ensure that the data will be curated in a way that can potentially support those use-cases.

brucearctor commented 2 years ago

Cookbook/Classes. Mostly that it is nice if condensing the data that will be stored. About any structure should be fine ( what you linked seems OK ). Also, that is something that I imagine will be easy to modify, so wouldn't get too hung up on the format.

brucearctor commented 2 years ago

For now, you are largely focused on existing YaMNet, but your interests seem to also involve extending the model [ ex: via Transfer Learning ]. So, do think about how what you are doing now may change to accommodate. Ex: additional classes in the codebook, how you might search/explore things [ either python or bash ].

brucearctor commented 2 years ago

LGTM. Related, as mentioned on recent call -- is this really 'sound effects', or more general 'audio analysis'. If not focusing on sound effects, then might want to revisit that naming.