nateweinstein / ecco_poc

a POC for a semantic search engine for audio transcripts
0 stars 0 forks source link

Transcript creation time-stamps are messed up for long strings #2

Open nateweinstein opened 1 year ago

nateweinstein commented 1 year ago

Issue: If a speaker has a monologue>150 tokens, we will split the text into smaller chunks for readability. But those chunks currently all get the same time stamp.

This is bc the max_token logic is being handled in CSV content_gen/content_helpers/import_csv_from_transcript.py; it should be handled in content_gen/content_helpers/deepgram_builds.py where we can use the JSON file generated by deepgram to set the start time.