Closed ks0m1c closed 9 months ago
notes:
IR:
Indexer is more of a intro to ecto
Did up a script for .srt creation. input file for script: chalisa.json output file for script: (see commit)
[]
Further thoguhts on specific observations: 1) Instrumentals is great to know when to space out or buffer and span out the timestamps 2) Lack of chronological unity can be compensated with our error correcting look for first candidate from verse and span verse time stamps till next deviation in time 3) Error-correcting is probably the name of the game for this artform
Dangers:
1) When theres repeating verses detected during audio snippets resolve to the same verse number so that verse to timestamp is 1:many relationship
The goal here is to map the fixed-structured, perfect data to the correct part of the .srt file. Since we are working with .srt files, which naturally already have events that are indexed, a first pass of this for this is to just map each verse to the correct event index.
In order to do this, we need to some sentence similarity matching. The underlying idea would be to use some sort of edit distances (jaccard, levenshtein) or other distance metrics like cosine distance.
After a quick check, it's better to use a recent library that supports indic texts for this first pass.
Here are some candidates:
unrelated but interesting stuff found:
ref this thread: https://github.com/ve1ld/vyasa/pull/22#discussion_r1456698940
In terms of supporting keyboard shortcuts livebook serves as a great example
demonstrated thru a declarative shortcuts schema and component
Present Context
Helping words...
Where we collectively investigate and interrogate the problem space and iteratively scope our approach. Breakdown to landmarks that communicate shared context we are working towards through 2-tiered task list, CRUD list elements as development unfolds. Strike the scope of code that reveals the most about the problem/solution FIRST not necessarily the easiest or hardest parts ``` ``` ---[ ] Intermediate Representation
[ ] Text Indexing
[ ] Media associations with the source text (dependent) on IR and text indexing
Groundwork
Helping words...
Introduce us to the problem space. Write out what you already know about the terrain you are the recce commander enriching us with details beyond the fog of war. Where have you tried applying and encountered difficulties? How have others attempted to scale or explore these challenges? (Embed internal & external links to related or possible paths of exploration, stackoverflow, documentation, github etc) Who should be notified? Emphasis on previous or current practice to discover what is ugly, missing, or unnecessary.Downloading auto-generated YouTube transcriptions that which we bild
Reflection
Helping words...
Where the eternal wheel returns back to practice and what we finally implemented is to be outlined. You are the historian or archivist bringing clarity to future-yous and us about your foray. Emphasis on approaching timeless solutions for well-defined problem space through distillation by decanting that which is un-needed and abstracting that which is essential to approaching the problem space. Add any reflections and internal links to future potential and blindsides. ``` INPUT UR ANSWER HERE ```