jdrbc / podly_pure_podcasts

Ad-block for podcasts
MIT License
215 stars 8 forks source link

Web UI manual ad annotation functionality #12

Open frrad opened 1 month ago

frrad commented 1 month ago

Add the ability to use a web UI to manually annotate ads.

Probably follows https://github.com/jdrbc/podly_pure_podcasts/issues/11

This may end up being the same PR as https://github.com/jdrbc/podly_pure_podcasts/issues/9

Basically:

dfjones89 commented 1 week ago

I wonder if another (potentially simpler) approach is to allow users to provide a list of known advertisers, in the hope of giving the LLM a greater chance of identifying ad segments. I've tweaked my system_prompt.txt to add the below text. It's a bit early to see if ad detection has been improved, but I'll report back here with my findings once I've listened to some newly processed episodes 👍

Known advertisers are listed below, though this list is not exhaustive and you should expect to encounter adverts from other companies.
If a known advertiser is mentioned in a section of transcript, you can be more confident in classifying that section as an advertisement.

Known advertisers:
 - Better Health
 - Shopify
frrad commented 1 week ago

That definitely seems like it would help. If it does, maybe we could consider adding first class support for basically an known_advertisers: Optional[List[str]] per podcast 🤔

I still think the web UI transcript thing is going to be helpful though. Even if the ad list is perfectly effective, some flow where you

  1. read transcript
  2. add to advertiser list
  3. re-run detection

is going to be much nicer in some form of UI

dfjones89 commented 3 days ago

As promised, just a quick follow-up: Updating my prompt to include a list of known advertisers has improved the detection of adverts that were previously being overlooked 🙌 Being able to update this list through a web interface would be a lovely feature. Thanks again for all your work! 👏