paul-bokelman / atmosphere

Immersive audio book generation
1 stars 0 forks source link

task: change mapping approach #18

Closed paul-bokelman closed 2 days ago

paul-bokelman commented 1 week ago

Description

Currently the process of mapping timestamps to sound effects is the following: get all sounds from db -> compare each timestamp keywords to all sounds -> get back mapping

This process works for a small amount of sound entries but will quickly fail (due to context window) when the database is fully populated.

Proposed solutions:

  1. use timestamp keywords to find best set of keywords present in db -> get all keyword associated sounds -> find best match
    • this approach requires calling gemini multiple times for each timestamp...
  2. refactor mappings input data to just include keywords and their related sound ids
  3. force gemini to only choose from existing keywords in timestamp generation step -> get all keyword associated sounds -> find best match

Parent: #8

paul-bokelman commented 6 days ago

Additionally the query should provide better results. Currently urban/road is being paired with [ "yellow, "bricks","uneven","walking","gravel"] which is unacceptable.

Maybe use the category to see if the sound even makes sense in the context then the specific keywords to further narrow down the search?

paul-bokelman commented 6 days ago

hypothesis: gemini will do better at pairing sounds if we use phrases to describe them

paul-bokelman commented 6 days ago

possible approach: get closest category to phrase -> get options based on category -> compare phrases and choose sound

paul-bokelman commented 6 days ago

including the overall context of the story/chapter could really help too...

sawyerrice commented 5 days ago

possible approach: get closest category to phrase -> get options based on category -> compare phrases and choose sound

I think this is a pretty good idea

sawyerrice commented 5 days ago

The context window stuff should be pretty easy to get around as well if we just used a simple divide and conquer approach. If there are too many sounds, split the sounds into chunks and feed the chunks to gemini one at a time, then have a combining step and ask gemini which of its previous choices is best fitting based on the original keywords. This kinda assumes gemini will be consistent with its answers across queries tho...