Closed paul-bokelman closed 2 days ago
Additionally the query should provide better results. Currently urban/road is being paired with [ "yellow, "bricks","uneven","walking","gravel"] which is unacceptable.
Maybe use the category to see if the sound even makes sense in the context then the specific keywords to further narrow down the search?
hypothesis: gemini will do better at pairing sounds if we use phrases to describe them
possible approach: get closest category to phrase -> get options based on category -> compare phrases and choose sound
including the overall context of the story/chapter could really help too...
possible approach: get closest category to phrase -> get options based on category -> compare phrases and choose sound
I think this is a pretty good idea
The context window stuff should be pretty easy to get around as well if we just used a simple divide and conquer approach. If there are too many sounds, split the sounds into chunks and feed the chunks to gemini one at a time, then have a combining step and ask gemini which of its previous choices is best fitting based on the original keywords. This kinda assumes gemini will be consistent with its answers across queries tho...
Description
Currently the process of mapping timestamps to sound effects is the following: get all sounds from db -> compare each timestamp keywords to all sounds -> get back mapping
This process works for a small amount of sound entries but will quickly fail (due to context window) when the database is fully populated.
Proposed solutions:
Parent: #8