paul-bokelman / atmosphere

Immersive audio book generation powered by Gemini built for Google's Gemini API Developer Competition
https://atmosphere.pab.dev/
1 stars 0 forks source link

task: individual timestamp segment generation #23

Closed paul-bokelman closed 1 month ago

paul-bokelman commented 3 months ago

Description

instead of generating all segments at once follow this procedure: get timestamps (audio segments) -> have gemini generate rich descriptions for each segment -> compile back into 1 list

Parent: #8

paul-bokelman commented 2 months ago

Still not sure if we should implement this approach, some testing will need to be done to see if accuracy increases. This is very promising though considering gemini may be more capable when handling a smaller quantity of diverse information. This approach does make generation linearly slower.

paul-bokelman commented 2 months ago

I will do additional testing on this front because the timestamp generation is one of the largest problems we are currently facing.

paul-bokelman commented 2 months ago

i don't believe this is necessary for smaller audios but it may be for larger ones...

paul-bokelman commented 1 month ago

don't have time... closing.

this may be reopened if project is revisited because it does seem like a necessary step in the right direction