rmusser01 / tldw

Too Long, Didn't Watch(TL/DW): Your Personal Research Multi-Tool - Open Source NotebookLM
Apache License 2.0
45 stars 2 forks source link

Improvement: Add user-defined timing-based/token-count summarization #24

Closed rmusser01 closed 1 month ago

rmusser01 commented 1 month ago

As a user, I would like to be able to specify timeblocks or token count sizes, which are cut out of the transcription, and then summarized in piece. These summaries are then strung together, or re-summarized together as one.

When using the CLI, I should be able to pass an argument so that summarization will occur based on timeblocks of the transcription, and not based on the entirety of the original transcription. - CLI arg: '--chunk-summary' / '-cs' The resulting 'chunks' should be user definable and determined through 1 of 3 means:

  1. Time - '--time-count'
  2. Token-count - '--token-count'
  3. Some fancy text analysis to identify lulls in the convo/when the topic changes (TBD)

If the '--chunk-summary' / '-cs' arguments are passed, but the '--time-count' or '--token-count' arguments are not, then a default assumption of X time is assumed, and used instead.