Open frrad opened 1 month ago
I wonder if another (potentially simpler) approach is to allow users to provide a list of known advertisers, in the hope of giving the LLM a greater chance of identifying ad segments. I've tweaked my system_prompt.txt
to add the below text. It's a bit early to see if ad detection has been improved, but I'll report back here with my findings once I've listened to some newly processed episodes 👍
Known advertisers are listed below, though this list is not exhaustive and you should expect to encounter adverts from other companies.
If a known advertiser is mentioned in a section of transcript, you can be more confident in classifying that section as an advertisement.
Known advertisers:
- Better Health
- Shopify
That definitely seems like it would help. If it does, maybe we could consider adding first class support for basically an known_advertisers: Optional[List[str]]
per podcast 🤔
I still think the web UI transcript thing is going to be helpful though. Even if the ad list is perfectly effective, some flow where you
is going to be much nicer in some form of UI
As promised, just a quick follow-up: Updating my prompt to include a list of known advertisers has improved the detection of adverts that were previously being overlooked 🙌 Being able to update this list through a web interface would be a lovely feature. Thanks again for all your work! 👏
Add the ability to use a web UI to manually annotate ads.
Probably follows https://github.com/jdrbc/podly_pure_podcasts/issues/11
This may end up being the same PR as https://github.com/jdrbc/podly_pure_podcasts/issues/9
Basically:
Segment