mediathekview / MServer

Server zum Steuern des Crawler
https://mediathekview.de
GNU General Public License v3.0
71 stars 20 forks source link

BR: Umstellung Mediathek im Herbst 2017 #241

Closed pidoubleyou closed 5 years ago

pidoubleyou commented 7 years ago

Ich bin zufällig bei der Analyse von #195 darauf gestoßen, dass der BR aktuell eine BETA-Phase für eine neue Mediathek durchführt. Diese soll im Herbst live geschalten werden. Siehe Pressemitteilung.

Zur Unterstützung der neuen Mediathek muss der BR-Crawler neu geschrieben werden.

Nicklas2751 commented 7 years ago

Dann nehm ich mir den als nächstes für die neuen crawler vor. Heute wird wahrscheinlich die neue gui fertig dann kann ich direkt damit starten. :)

Nicklas2751 commented 7 years ago

Die neue BR mediathek arbeitet mit GraphQL. Folgende Queries konnte ich extrahieren und werde sie dann für den neuen BR Crawler verwenden:

Verpasste Sendungen:

{"query":"query ProgrammeCalendarPageQuery(\n  $broadcasterId: ID!\n  $livestreamFilter: LivestreamFilter!\n  $programmeFilter: ProgrammeFilter!\n  $programmeStageFilter: ProgrammeFilter!\n) {\n  viewer {\n    ...ProgrammeCalendarPage_viewer_5uC0z\n    id\n  }\n}\n\nfragment ProgrammeCalendarPage_viewer_5uC0z on Viewer {\n  broadcastService(id: $broadcasterId) {\n    __typename\n    ...ProgrammeStage_broadcastService_4juArI\n    ...ProgrammeContainer_broadcastService_3zH8HL\n    id\n  }\n  allLivestreams(filter: $livestreamFilter) {\n    edges {\n      node {\n        __typename\n        id\n      }\n    }\n  }\n}\n\nfragment ProgrammeStage_broadcastService_4juArI on BroadcastServiceInterface {\n  today: programmes(last: 1, orderBy: BROADCASTS_START_ASC, filter: $programmeStageFilter) {\n    edges {\n      node {\n        __typename\n        ...ProgrammeInfo_programme\n        id\n      }\n    }\n  }\n}\n\nfragment ProgrammeContainer_broadcastService_3zH8HL on BroadcastServiceInterface {\n  id\n  containerToday: programmes(first: 96, orderBy: BROADCASTS_START_ASC, filter: $programmeFilter) {\n    ...ProgrammeTable_programmes\n  }\n}\n\nfragment ProgrammeTable_programmes on ProgrammeConnection {\n  edges {\n    node {\n      __typename\n      id\n      ...ProgrammeTableRow_programme\n    }\n  }\n}\n\nfragment ProgrammeTableRow_programme on ProgrammeInterface {\n  ...ProgrammeTeaserBox_programme\n  title\n  kicker\n  broadcasts(first: 1) {\n    edges {\n      node {\n        __typename\n        start\n        end\n        id\n      }\n    }\n  }\n  id\n}\n\nfragment ProgrammeTeaserBox_programme on ProgrammeInterface {\n  title\n  broadcasts(first: 1) {\n    edges {\n      node {\n        __typename\n        start\n        end\n        id\n      }\n    }\n  }\n  ... on CreativeWorkInterface {\n    ...TeaserImage_creativeWorkInterface\n  }\n  ... on ClipInterface {\n    title\n    kicker\n    essences(first: 1) {\n      count\n    }\n    ...Bookmark_clip\n    ...Duration_clip\n  }\n}\n\nfragment TeaserImage_creativeWorkInterface on CreativeWorkInterface {\n  id\n  kicker\n  title\n  teaserImages(first: 1) {\n    edges {\n      node {\n        __typename\n        shortDescription\n        id\n      }\n    }\n  }\n  defaultTeaserImage {\n    __typename\n    imageFiles(first: 1) {\n      edges {\n        node {\n          __typename\n          id\n          publicLocation\n          crops(first: 10) {\n            count\n            edges {\n              node {\n                __typename\n                publicLocation\n                width\n                height\n                id\n              }\n            }\n          }\n        }\n      }\n    }\n    id\n  }\n}\n\nfragment Bookmark_clip on ClipInterface {\n  id\n  bookmarked\n  title\n}\n\nfragment Duration_clip on ClipInterface {\n  duration\n}\n\nfragment ProgrammeInfo_programme on ProgrammeInterface {\n  id\n  title\n  kicker\n  description\n  broadcasts(first: 1) {\n    edges {\n      node {\n        __typename\n        start\n        end\n        id\n      }\n    }\n  }\n  ... on ClipInterface {\n    ...Duration_clip\n  }\n}\n","variables":{"broadcasterId":"BroadcastService:http://ard.de/ontologies/ard#BR_Fernsehen","livestreamFilter":{"broadcastedBy":{"id":{"eq":"BroadcastService:http://ard.de/ontologies/ard#BR_Fernsehen"}}},"programmeFilter":{"status":{"id":{"eq":"Status:http://ard.de/ontologies/lifeCycle#published"}},"broadcasts":{"start":{"gte":"2017-09-22T04:00:00.000Z","lte":"2017-09-27T04:00:00.000Z"}}},"programmeStageFilter":{"status":{"id":{"eq":"Status:http://ard.de/ontologies/lifeCycle#published"}},"broadcasts":{"start":{"gte":"2017-09-22T04:00:00.000Z","lte":"2017-09-19T16:52:25.559Z"}}}}}

Alle Sendungsreihen:

{"query":"query SeriesIndexRefetchQuery(\n  $seriesFilter: SeriesFilter\n) {\n  viewer {\n    ...SeriesIndex_viewer_19SNIy\n    id\n  }\n}\n\nfragment SeriesIndex_viewer_19SNIy on Viewer {\n  seriesIndexAllSeries: allSeries(first: 1000, orderBy: TITLE_ASC, filter: $seriesFilter) {\n    edges {\n      node {\n        __typename\n        id\n        title\n        ...SeriesTeaserBox_node\n        ...TeaserListItem_node\n      }\n    }\n  }\n}\n\nfragment SeriesTeaserBox_node on Node {\n  __typename\n  id\n  ... on CreativeWorkInterface {\n    ...TeaserImage_creativeWorkInterface\n  }\n  ... on SeriesInterface {\n    ...SubscribeAction_series\n    subscribed\n    title\n  }\n}\n\nfragment TeaserListItem_node on Node {\n  __typename\n  id\n  ... on CreativeWorkInterface {\n    ...TeaserImage_creativeWorkInterface\n  }\n  ... on ClipInterface {\n    title\n  }\n}\n\nfragment TeaserImage_creativeWorkInterface on CreativeWorkInterface {\n  id\n  kicker\n  title\n   }\n\nfragment SubscribeAction_series on SeriesInterface {\n  id\n  subscribed\n}\n","variables":{"seriesFilter":{"title":{"startsWith":"*"},"audioOnly":{"eq":false},"status":{"id":{"eq":"Status:http://ard.de/ontologies/lifeCycle#published"}}}}}

Sendungsfolgen:

{"query":"query SeriesPageRendererQuery(  $id: ID!  $itemCount: Int  $clipCount: Int  $previousEpisodesFilter: ProgrammeFilter  $clipsOnlyFilter: ProgrammeFilter) {  viewer {    ...SeriesPage_viewer_2PDDaq    id  }}fragment SeriesPage_viewer_2PDDaq on Viewer {  series(id: $id) {    __typename    ...TeaserImage_creativeWorkInterface    ...SeriesBrandBanner_series    clipsOnly: episodes(orderBy: VERSIONFROM_DESC, first: $clipCount, filter: $clipsOnlyFilter) {      ...ProgrammeSlider_programmes    }    previousEpisodes: episodes(first: $itemCount, orderBy: BROADCASTS_START_DESC, filter: $previousEpisodesFilter) {      ...ProgrammeSlider_programmes      edges {        node {          __typename          ...SmallTeaserBox_node          id        }      }    }    id  }}fragment TeaserImage_creativeWorkInterface on CreativeWorkInterface {  id  kicker  title }fragment SeriesBrandBanner_series on SeriesInterface {  ...SubscribeAction_series  title  shortDescription  externalURLS(first: 1) {    edges {      node {        __typename        id        url        label      }    }  }  }fragment ProgrammeSlider_programmes on ProgrammeConnection {  edges {    node {      __typename      ...SmallTeaserBox_node      id    }  }}fragment SmallTeaserBox_node on Node {  id  ... on CreativeWorkInterface {    ...TeaserImage_creativeWorkInterface  }  ... on ClipInterface {    id    title    kicker    ...Bookmark_clip    ...Duration_clip    ...Progress_clip  }  ... on ProgrammeInterface {    broadcasts(first: 1, orderBy: START_DESC) {      edges {        node {          __typename          start          id        }      }    }  }}fragment Bookmark_clip on ClipInterface {  id  bookmarked  title}fragment Duration_clip on ClipInterface {  duration}fragment Progress_clip on ClipInterface {  myInteractions {    __typename    progress    completed    id  }}fragment SubscribeAction_series on SeriesInterface {  id  subscribed}","variables":{"id":"Series:584f4c523b467900117c0f47","itemCount":31,"clipCount":6,"previousEpisodesFilter":{"essences":{"empty":{"eq":false}},"broadcasts":{"empty":{"eq":false},"start":{"lte":"2017-09-22T17:21:12.179Z"}}},"clipsOnlyFilter":{"broadcasts":{"empty":{"eq":true}},"essences":{"empty":{"eq":false}}}}}

Folgen Details:

{"query":"query DetailPageRendererQuery(  $clipId: ID!  $isClip: Boolean!  $isLivestream: Boolean!  $livestream: ID!) {  viewer {    ...DetailPage_viewer_22r5xP    id  }}fragment DetailPage_viewer_22r5xP on Viewer {  ...VideoPlayer_viewer_22r5xP  ...ClipActions_viewer  detailClip: clip(id: $clipId) {    __typename    id    title    ...ClipActions_clip    ...ClipInfo_clip    ...ChildContentRedirect_creativeWork  }}fragment VideoPlayer_viewer_22r5xP on Viewer {  id  clip(id: $clipId) @include(if: $isClip) {    __typename    id    ageRestriction    videoFiles(first: 100) {      edges {        node {          __typename          id          mimetype          publicLocation          videoProfile {            __typename            id            width          }        }      }    }    ...Error_clip    title  }  livestream(id: $livestream) @include(if: $isLivestream) {    __typename    id    streamingUrls(first: 10, filter: {accessibleIn: {contains: \"GeoZone:http://ard.de/ontologies/coreConcepts#GeoZone_World\"}, hasEmbeddedSubtitles: {eq: false}}) {      edges {        node {          __typename          id          publicLocation        }      }    }  }}fragment ClipActions_viewer on Viewer {  me {    __typename    bookmarks(first: 12) {      ...BookmarkAction_bookmarks    }    id  }}fragment ClipActions_clip on ClipInterface {  id  bookmarked  downloadable  ...BookmarkAction_clip  ...Rate_clip  ...Share_clip  ...Download_clip}fragment ClipInfo_clip on ClipInterface {  __typename  id  title  kicker  shortDescription  description  availableUntil  ...Duration_clip  ... on ProgrammeInterface {    publications(first: 1) {      edges {        node {          __typename          publishedBy {            __typename            name            id          }          id        }      }    }    broadcasts(first: 1) {      edges {        node {          __typename          start          id        }      }    }    episodeOf {      __typename      id      title      scheduleInfo      subscribed      ...SubscribeAction_series      ... on CreativeWorkInterface {        ...TeaserImage_creativeWorkInterface      }    }  }  ... on ItemInterface {    itemOf(first: 1) {      edges {        node {          __typename          publications(first: 1) {            edges {              node {                __typename                publishedBy {                  __typename                  name                  id                }                id              }            }          }          broadcasts(first: 1) {            edges {              node {                __typename                start                id              }            }          }          episodeOf {            __typename            id            title            scheduleInfo            subscribed            ...SubscribeAction_series            ... on CreativeWorkInterface {              ...TeaserImage_creativeWorkInterface            }          }          id        }      }    }  }}fragment ChildContentRedirect_creativeWork on CreativeWorkInterface {  categories(first: 100) {    edges {      node {        __typename        id      }    }  }}fragment Duration_clip on ClipInterface {  duration}fragment SubscribeAction_series on SeriesInterface {  id  subscribed}fragment TeaserImage_creativeWorkInterface on CreativeWorkInterface {  id  kicker  title  teaserImages(first: 1) {    edges {      node {        __typename        shortDescription        id      }    }  }  defaultTeaserImage {    __typename    imageFiles(first: 1) {      edges {        node {          __typename          id          publicLocation          crops(first: 10) {            count            edges {              node {                __typename                publicLocation                width                height                id              }            }          }        }      }    }    id  }}fragment BookmarkAction_clip on ClipInterface {  id}fragment Rate_clip on ClipInterface {  id  reactions {    likes    dislikes  }  myInteractions {    __typename    reaction {      __typename      id    }    id  }}fragment Share_clip on ClipInterface {  title  id}fragment Download_clip on ClipInterface {  videoFiles(first: 100) {    edges {      node {        __typename        publicLocation        videoProfile {          __typename          height          id        }        id      }    }  }}fragment BookmarkAction_bookmarks on ClipRemoteConnection {  count  ...TeaserSlider_clipRemoteConnection}fragment TeaserSlider_clipRemoteConnection on ClipRemoteConnection {  edges {    node {      __typename      ...SmallTeaserBox_node      id    }  }}fragment SmallTeaserBox_node on Node {  id  ... on CreativeWorkInterface {    ...TeaserImage_creativeWorkInterface  }  ... on ClipInterface {    id    title    kicker    ...Duration_clip  }  ... on ProgrammeInterface {    broadcasts(first: 1, orderBy: START_DESC) {      edges {        node {          __typename          start          id        }      }    }  }}fragment Error_clip on ClipInterface {  ageRestriction}","variables":{"clipId":"Programme:598ae1644057ba0012bd2d57","isClip":true,"isLivestream":false,"livestream":"Livestream:"}}

Das ganze muss dann via POST an: https://proxy-base.master.mango.express/graphql

Nicklas2751 commented 7 years ago

Hm die meisten Informationen konnte ich bereits zusammen suchen und habe sie soweit in Code gegossen. Was mir aber aktuell noch fehlt sind die GEO-Infromationen und Untertitel die kann ich aktuell nicht finden.

zxsd commented 7 years ago

Es kann sein, daß die Untertitelung bis jetzt nicht aktiviert ist. Diese Sendung hat in der 'normalen' Mediathek UT; in der Beta-Mediathek funzt id3 nicht.

dahoamisdahoam-folge 1980-eine bambergerin

pidoubleyou commented 7 years ago

Offene Punkte/Probleme des BR-Crawlers im Branch hotfix/241:

TheSasch commented 7 years ago

Bin an den Auflösungen dran...

pidoubleyou commented 7 years ago

Die Geo-Kennzeichnung funktioniert für den Branch hotfix/241 durch den Fix von #281

pidoubleyou commented 7 years ago

Das Thema wird jetzt anders ausgelesen, so dass diese mit den Themen des bisherigen Crawlers übereinstimmen und einzelne Sendungen nicht unter verschiedenen Themen gelistet werden. Damit funktionieren auch die alten Abos weiterhin.

pidoubleyou commented 7 years ago

Status Datum+Zeit: es gibt viele Einträge, für die kein Wert ermittelt werden kann. In den JSON-Infos gibt es keine broadcast-Infos. Bei den Einträgen handelt es sich anscheinend um einzelne Ausschnitte von anderen Sendungen.

Ich werde noch einige Tests machen (v.a. Laufzeit zusammen mit anderen Crawlern) und würde dann den BR-Crawler mit diesem Stand freigeben.

pidoubleyou commented 5 years ago

Neuer BR Crawler ist schon lange aktiv, deshalb geschlossen