Closed rahulbot closed 4 years ago
another option is to add a 'download and transcribe podcasts' boolean and just use the existing 'syndicated' feed type.
On Wed, Jan 8, 2020 at 10:21 AM rahulbot notifications@github.com wrote:
As per the plan developed in #515 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_berkmancenter_mediacloud_issues_515&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=0c5FW2CrwCh84ocLICzUHjcwKK-QMUDy4RRw_n18mMo&m=0igWs_v6p0eABP1nsJyiZ8_bAxL72P9krzPNQezLLFg&s=np6REx-c2pTq8sWbZiu_r5OMJ-PK9rdOb54YBvtQ8Zk&e=, we need to support podcast transcription (via Google). I'm creating this task to track implementation, as the other tracks planning.
As mentioned on the recent call, this needs to allow us to mark specific feeds / sources as a "podcast" source to queue if up for transcription. I was thinking this could be a need "feed.type" option, because that would make it easy to set in the UI: [image: Update_Feed_The_BostonGlobeSource_Manager_Media_Cloud] <https://urldefense.proofpoint.com/v2/url?u=https-3Auser-2Dimages.githubusercontent.com_673178_71995638-2De6b57400-2D3208-2D11ea-2D88be-2D024dd41f1afe.jpg&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=0c5FW2CrwCh84ocLICzUHjcwKK-QMUDy4RRw_n18mMo&m=0igWs_v6p0eABP1nsJyiZ8_bAxL72P9krzPNQezLLFg&s=9k-YdSxuxz-KdwurP8YV-_GvWwo2HBgEEAelVQc07Tk&e=>
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_berkmancenter_mediacloud_issues_650-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAN66TZ3E77UITZPLOGNKZDQ4X4OXA5CNFSM4KEK4HX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IE2AESA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=0c5FW2CrwCh84ocLICzUHjcwKK-QMUDy4RRw_n18mMo&m=0igWs_v6p0eABP1nsJyiZ8_bAxL72P9krzPNQezLLFg&s=O6-rDDW-ezUDEBT57zY1wHELyoSq2kls0c3qUYySmwQ&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAN66T2M6R5XNQ3QBVLT2HDQ4X4OXANCNFSM4KEK4HXQ&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=0c5FW2CrwCh84ocLICzUHjcwKK-QMUDy4RRw_n18mMo&m=0igWs_v6p0eABP1nsJyiZ8_bAxL72P9krzPNQezLLFg&s=kFcoznYRX2jfDGBq72PGUeJqPp015bc7ugQVxHd3YZc&e= .
Deployed podcast transcription services, tried it with an initial podcast, seemed to work, so I've added the rest.
Here's the list:
https://docs.google.com/spreadsheets/d/1nUnhKGaazTgrUDhswCjz8KJ_KTlmSqvrF8VEkb2vQIs/edit#gid=0
All added podcast media sources are in collection
tag set (tag_sets_id = 5
) and are tagged with a Podcasts
tag (tags_id = 196654054
).
I'm not sure they got added correctly. For instance, the Glenn Beck podcast feed show up as "syndicated" when I think it should be "podcast". Am I mistaken?
Oh, and I was wondering why they weren't being fetched!
Updated to podcast
, let's see if it works now.
The new 'podcast' type feed on that source still hasn't fetched any stories. It looks like that is the Apple Podcasts URL, not the raw feed URL. Are you parsing out feed URLs from the Podcasts link automatically?
To check this out I hacked somebody's script and made a ruby script that fetches the feed url from the Apple Podcast URL. For that Glen Beck feed it spits out https://feeds.megaphone.fm/BMDC3567910388
as the actual feed URL. Do we need to update the ones hosted at Apple Podcast to be the raw URLs or not?
Update: I switched it and am waiting for a fetch to happen to see if it works or not.
I can see 1248 stories from media source 1363086, so maybe the crawler didn't get around to fetching that specific podcast six days ago.
Crawler supports both iTunes Podcasts URLs (e.g. https://podcasts.apple.com/us/podcast/lovett-or-leave-it/id1216346463) and Google Podcasts URLs (e.g. https://podcasts.google.com/?feed=aHR0cHM6Ly93d3cubnByLm9yZy9yc3MvcG9kY2FzdC5waHA_aWQ9NTEwMjg5), meaning that you can add them as feeds.url
directly, and the crawler should be able to work out the actual feed URL hiding behind that page on every fetch.
As per the plan developed in #515, we need to support podcast transcription (via Google). I'm creating this task to track implementation, as the other tracks planning.
As mentioned on the recent call, this needs to allow us to mark specific feeds / sources as a "podcast" source to queue if up for transcription. I was thinking this could be a need "feed.type" option, because that would make it easy to set in the UI: