Closed pidoubleyou closed 5 years ago
Dann nehm ich mir den als nächstes für die neuen crawler vor. Heute wird wahrscheinlich die neue gui fertig dann kann ich direkt damit starten. :)
Die neue BR mediathek arbeitet mit GraphQL. Folgende Queries konnte ich extrahieren und werde sie dann für den neuen BR Crawler verwenden:
Verpasste Sendungen:
{"query":"query ProgrammeCalendarPageQuery(\n $broadcasterId: ID!\n $livestreamFilter: LivestreamFilter!\n $programmeFilter: ProgrammeFilter!\n $programmeStageFilter: ProgrammeFilter!\n) {\n viewer {\n ...ProgrammeCalendarPage_viewer_5uC0z\n id\n }\n}\n\nfragment ProgrammeCalendarPage_viewer_5uC0z on Viewer {\n broadcastService(id: $broadcasterId) {\n __typename\n ...ProgrammeStage_broadcastService_4juArI\n ...ProgrammeContainer_broadcastService_3zH8HL\n id\n }\n allLivestreams(filter: $livestreamFilter) {\n edges {\n node {\n __typename\n id\n }\n }\n }\n}\n\nfragment ProgrammeStage_broadcastService_4juArI on BroadcastServiceInterface {\n today: programmes(last: 1, orderBy: BROADCASTS_START_ASC, filter: $programmeStageFilter) {\n edges {\n node {\n __typename\n ...ProgrammeInfo_programme\n id\n }\n }\n }\n}\n\nfragment ProgrammeContainer_broadcastService_3zH8HL on BroadcastServiceInterface {\n id\n containerToday: programmes(first: 96, orderBy: BROADCASTS_START_ASC, filter: $programmeFilter) {\n ...ProgrammeTable_programmes\n }\n}\n\nfragment ProgrammeTable_programmes on ProgrammeConnection {\n edges {\n node {\n __typename\n id\n ...ProgrammeTableRow_programme\n }\n }\n}\n\nfragment ProgrammeTableRow_programme on ProgrammeInterface {\n ...ProgrammeTeaserBox_programme\n title\n kicker\n broadcasts(first: 1) {\n edges {\n node {\n __typename\n start\n end\n id\n }\n }\n }\n id\n}\n\nfragment ProgrammeTeaserBox_programme on ProgrammeInterface {\n title\n broadcasts(first: 1) {\n edges {\n node {\n __typename\n start\n end\n id\n }\n }\n }\n ... on CreativeWorkInterface {\n ...TeaserImage_creativeWorkInterface\n }\n ... on ClipInterface {\n title\n kicker\n essences(first: 1) {\n count\n }\n ...Bookmark_clip\n ...Duration_clip\n }\n}\n\nfragment TeaserImage_creativeWorkInterface on CreativeWorkInterface {\n id\n kicker\n title\n teaserImages(first: 1) {\n edges {\n node {\n __typename\n shortDescription\n id\n }\n }\n }\n defaultTeaserImage {\n __typename\n imageFiles(first: 1) {\n edges {\n node {\n __typename\n id\n publicLocation\n crops(first: 10) {\n count\n edges {\n node {\n __typename\n publicLocation\n width\n height\n id\n }\n }\n }\n }\n }\n }\n id\n }\n}\n\nfragment Bookmark_clip on ClipInterface {\n id\n bookmarked\n title\n}\n\nfragment Duration_clip on ClipInterface {\n duration\n}\n\nfragment ProgrammeInfo_programme on ProgrammeInterface {\n id\n title\n kicker\n description\n broadcasts(first: 1) {\n edges {\n node {\n __typename\n start\n end\n id\n }\n }\n }\n ... on ClipInterface {\n ...Duration_clip\n }\n}\n","variables":{"broadcasterId":"BroadcastService:http://ard.de/ontologies/ard#BR_Fernsehen","livestreamFilter":{"broadcastedBy":{"id":{"eq":"BroadcastService:http://ard.de/ontologies/ard#BR_Fernsehen"}}},"programmeFilter":{"status":{"id":{"eq":"Status:http://ard.de/ontologies/lifeCycle#published"}},"broadcasts":{"start":{"gte":"2017-09-22T04:00:00.000Z","lte":"2017-09-27T04:00:00.000Z"}}},"programmeStageFilter":{"status":{"id":{"eq":"Status:http://ard.de/ontologies/lifeCycle#published"}},"broadcasts":{"start":{"gte":"2017-09-22T04:00:00.000Z","lte":"2017-09-19T16:52:25.559Z"}}}}}
Alle Sendungsreihen:
{"query":"query SeriesIndexRefetchQuery(\n $seriesFilter: SeriesFilter\n) {\n viewer {\n ...SeriesIndex_viewer_19SNIy\n id\n }\n}\n\nfragment SeriesIndex_viewer_19SNIy on Viewer {\n seriesIndexAllSeries: allSeries(first: 1000, orderBy: TITLE_ASC, filter: $seriesFilter) {\n edges {\n node {\n __typename\n id\n title\n ...SeriesTeaserBox_node\n ...TeaserListItem_node\n }\n }\n }\n}\n\nfragment SeriesTeaserBox_node on Node {\n __typename\n id\n ... on CreativeWorkInterface {\n ...TeaserImage_creativeWorkInterface\n }\n ... on SeriesInterface {\n ...SubscribeAction_series\n subscribed\n title\n }\n}\n\nfragment TeaserListItem_node on Node {\n __typename\n id\n ... on CreativeWorkInterface {\n ...TeaserImage_creativeWorkInterface\n }\n ... on ClipInterface {\n title\n }\n}\n\nfragment TeaserImage_creativeWorkInterface on CreativeWorkInterface {\n id\n kicker\n title\n }\n\nfragment SubscribeAction_series on SeriesInterface {\n id\n subscribed\n}\n","variables":{"seriesFilter":{"title":{"startsWith":"*"},"audioOnly":{"eq":false},"status":{"id":{"eq":"Status:http://ard.de/ontologies/lifeCycle#published"}}}}}
Sendungsfolgen:
{"query":"query SeriesPageRendererQuery( $id: ID! $itemCount: Int $clipCount: Int $previousEpisodesFilter: ProgrammeFilter $clipsOnlyFilter: ProgrammeFilter) { viewer { ...SeriesPage_viewer_2PDDaq id }}fragment SeriesPage_viewer_2PDDaq on Viewer { series(id: $id) { __typename ...TeaserImage_creativeWorkInterface ...SeriesBrandBanner_series clipsOnly: episodes(orderBy: VERSIONFROM_DESC, first: $clipCount, filter: $clipsOnlyFilter) { ...ProgrammeSlider_programmes } previousEpisodes: episodes(first: $itemCount, orderBy: BROADCASTS_START_DESC, filter: $previousEpisodesFilter) { ...ProgrammeSlider_programmes edges { node { __typename ...SmallTeaserBox_node id } } } id }}fragment TeaserImage_creativeWorkInterface on CreativeWorkInterface { id kicker title }fragment SeriesBrandBanner_series on SeriesInterface { ...SubscribeAction_series title shortDescription externalURLS(first: 1) { edges { node { __typename id url label } } } }fragment ProgrammeSlider_programmes on ProgrammeConnection { edges { node { __typename ...SmallTeaserBox_node id } }}fragment SmallTeaserBox_node on Node { id ... on CreativeWorkInterface { ...TeaserImage_creativeWorkInterface } ... on ClipInterface { id title kicker ...Bookmark_clip ...Duration_clip ...Progress_clip } ... on ProgrammeInterface { broadcasts(first: 1, orderBy: START_DESC) { edges { node { __typename start id } } } }}fragment Bookmark_clip on ClipInterface { id bookmarked title}fragment Duration_clip on ClipInterface { duration}fragment Progress_clip on ClipInterface { myInteractions { __typename progress completed id }}fragment SubscribeAction_series on SeriesInterface { id subscribed}","variables":{"id":"Series:584f4c523b467900117c0f47","itemCount":31,"clipCount":6,"previousEpisodesFilter":{"essences":{"empty":{"eq":false}},"broadcasts":{"empty":{"eq":false},"start":{"lte":"2017-09-22T17:21:12.179Z"}}},"clipsOnlyFilter":{"broadcasts":{"empty":{"eq":true}},"essences":{"empty":{"eq":false}}}}}
Folgen Details:
{"query":"query DetailPageRendererQuery( $clipId: ID! $isClip: Boolean! $isLivestream: Boolean! $livestream: ID!) { viewer { ...DetailPage_viewer_22r5xP id }}fragment DetailPage_viewer_22r5xP on Viewer { ...VideoPlayer_viewer_22r5xP ...ClipActions_viewer detailClip: clip(id: $clipId) { __typename id title ...ClipActions_clip ...ClipInfo_clip ...ChildContentRedirect_creativeWork }}fragment VideoPlayer_viewer_22r5xP on Viewer { id clip(id: $clipId) @include(if: $isClip) { __typename id ageRestriction videoFiles(first: 100) { edges { node { __typename id mimetype publicLocation videoProfile { __typename id width } } } } ...Error_clip title } livestream(id: $livestream) @include(if: $isLivestream) { __typename id streamingUrls(first: 10, filter: {accessibleIn: {contains: \"GeoZone:http://ard.de/ontologies/coreConcepts#GeoZone_World\"}, hasEmbeddedSubtitles: {eq: false}}) { edges { node { __typename id publicLocation } } } }}fragment ClipActions_viewer on Viewer { me { __typename bookmarks(first: 12) { ...BookmarkAction_bookmarks } id }}fragment ClipActions_clip on ClipInterface { id bookmarked downloadable ...BookmarkAction_clip ...Rate_clip ...Share_clip ...Download_clip}fragment ClipInfo_clip on ClipInterface { __typename id title kicker shortDescription description availableUntil ...Duration_clip ... on ProgrammeInterface { publications(first: 1) { edges { node { __typename publishedBy { __typename name id } id } } } broadcasts(first: 1) { edges { node { __typename start id } } } episodeOf { __typename id title scheduleInfo subscribed ...SubscribeAction_series ... on CreativeWorkInterface { ...TeaserImage_creativeWorkInterface } } } ... on ItemInterface { itemOf(first: 1) { edges { node { __typename publications(first: 1) { edges { node { __typename publishedBy { __typename name id } id } } } broadcasts(first: 1) { edges { node { __typename start id } } } episodeOf { __typename id title scheduleInfo subscribed ...SubscribeAction_series ... on CreativeWorkInterface { ...TeaserImage_creativeWorkInterface } } id } } } }}fragment ChildContentRedirect_creativeWork on CreativeWorkInterface { categories(first: 100) { edges { node { __typename id } } }}fragment Duration_clip on ClipInterface { duration}fragment SubscribeAction_series on SeriesInterface { id subscribed}fragment TeaserImage_creativeWorkInterface on CreativeWorkInterface { id kicker title teaserImages(first: 1) { edges { node { __typename shortDescription id } } } defaultTeaserImage { __typename imageFiles(first: 1) { edges { node { __typename id publicLocation crops(first: 10) { count edges { node { __typename publicLocation width height id } } } } } } id }}fragment BookmarkAction_clip on ClipInterface { id}fragment Rate_clip on ClipInterface { id reactions { likes dislikes } myInteractions { __typename reaction { __typename id } id }}fragment Share_clip on ClipInterface { title id}fragment Download_clip on ClipInterface { videoFiles(first: 100) { edges { node { __typename publicLocation videoProfile { __typename height id } id } } }}fragment BookmarkAction_bookmarks on ClipRemoteConnection { count ...TeaserSlider_clipRemoteConnection}fragment TeaserSlider_clipRemoteConnection on ClipRemoteConnection { edges { node { __typename ...SmallTeaserBox_node id } }}fragment SmallTeaserBox_node on Node { id ... on CreativeWorkInterface { ...TeaserImage_creativeWorkInterface } ... on ClipInterface { id title kicker ...Duration_clip } ... on ProgrammeInterface { broadcasts(first: 1, orderBy: START_DESC) { edges { node { __typename start id } } } }}fragment Error_clip on ClipInterface { ageRestriction}","variables":{"clipId":"Programme:598ae1644057ba0012bd2d57","isClip":true,"isLivestream":false,"livestream":"Livestream:"}}
Das ganze muss dann via POST an: https://proxy-base.master.mango.express/graphql
Hm die meisten Informationen konnte ich bereits zusammen suchen und habe sie soweit in Code gegossen. Was mir aber aktuell noch fehlt sind die GEO-Infromationen und Untertitel die kann ich aktuell nicht finden.
Es kann sein, daß die Untertitelung bis jetzt nicht aktiviert ist. Diese Sendung hat in der 'normalen' Mediathek UT; in der Beta-Mediathek funzt id3
nicht.
Offene Punkte/Probleme des BR-Crawlers im Branch hotfix/241:
Bin an den Auflösungen dran...
Die Geo-Kennzeichnung funktioniert für den Branch hotfix/241 durch den Fix von #281
Das Thema wird jetzt anders ausgelesen, so dass diese mit den Themen des bisherigen Crawlers übereinstimmen und einzelne Sendungen nicht unter verschiedenen Themen gelistet werden. Damit funktionieren auch die alten Abos weiterhin.
Status Datum+Zeit: es gibt viele Einträge, für die kein Wert ermittelt werden kann. In den JSON-Infos gibt es keine broadcast-Infos. Bei den Einträgen handelt es sich anscheinend um einzelne Ausschnitte von anderen Sendungen.
Ich werde noch einige Tests machen (v.a. Laufzeit zusammen mit anderen Crawlern) und würde dann den BR-Crawler mit diesem Stand freigeben.
Neuer BR Crawler ist schon lange aktiv, deshalb geschlossen
Ich bin zufällig bei der Analyse von #195 darauf gestoßen, dass der BR aktuell eine BETA-Phase für eine neue Mediathek durchführt. Diese soll im Herbst live geschalten werden. Siehe Pressemitteilung.
Zur Unterstützung der neuen Mediathek muss der BR-Crawler neu geschrieben werden.