openva / richmondsunlight.com

The Richmond Sunlight website.
https://www.richmondsunlight.com/
MIT License
12 stars 3 forks source link

Incorporate committee video #161

Open waldoj opened 6 years ago

waldoj commented 6 years ago

There's now a streaming video interface for committee meetings! For the House, it's completely different than the one for floor video, but for the Senate, it's Granicus. But the good news is that the House vendor has JSON representations of the data. So this view of the next week's scheduled videos also includes this JSON representation. (There's also monthly JSON.)

At this moment, the governor is speaking in Appropriations, and this is the JSON representation:

[
  {
    "Title": "Appropriations",
    "IconUri": null,
    "EntityStatus": 1,
    "EntityStatusDesc": "In Progress",
    "Location": "SCR",
    "Description": "Shared Committee Room, Pocahontas Building",
    "ThumbnailUri": "/00304/Harmony/images/video_live_small.png?timecode=20171218145408",
    "ScheduledStart": "2017-12-18T09:00:17",
    "ScheduledEnd": "2017-12-19T08:55:17",
    "HasArchiveStream": false,
    "ActualStart": "2017-12-18T09:00:17",
    "ActualEnd": null,
    "LastModifiedTime": "2017-12-18T09:32:40",
    "CommitteeId": null,
    "VenueId": null,
    "AssemblyProgress": 0,
    "AssemblyStatus": 0,
    "ForeignKey": "972",
    "Id": 2115,
    "Tag": null
  }
]

Seems to me that there are three things to be done:

waldoj commented 6 years ago

Note that the URL defined in ThumbnailUri just 404s.

waldoj commented 6 years ago

Also note that this event is scheduled to go for just short of 24 hours, so that range doesn't look real reliable.

waldoj commented 6 years ago

Uh. It looks like there's no bulk downloads? Just streaming?

waldoj commented 6 years ago

This is very doable. Here's a helpful script to grab the video—turns out you can just concat all of the MP4s together!

waldoj commented 6 years ago

I think the necessary process here is to build on the existing infrastructure:

waldoj commented 6 years ago

The URL for a given video must be constructed from the data file—it's http://sg001-harmony.sliq.net/00304/Harmony/en/PowerBrowser/PowerBrowserV2/YYYYMMDD/-1/ID, with YYMMDD and ID available as fields in the data file.

The actual URL to retrieve video from is in the page body, in script tags, e.g.:

var availableStreams = [{"GlobalEssenceFormatId":4,"IsLive":false,"Enabled":true,"AudioOnly":false,"VideoIndex":null,"AudioIndex":null,"StreamFormatId":12,"Url":"http://sg002-livein01.sliq.net/00304-vod/_definst_/2017/12/18/Appropriations_2017-12-18-09.00.00_2115_12.mp4/playlist.m3u8","Lang":"","StreamAssemblerList":null,"PreRoll":0.0,"Duration":9662,"Id":2239,"Tag":"Video"}];

So get the Url value from that, hack off /playlist.m3u8, and iterate from there.

waldoj commented 6 years ago

I skimmed through this test video, helpfully recorded by legislative staff. Here are the three types of chyrons that I saw:

3 1 2

Large and small, basically—I think the second two are just variations of the same thing. (One has Secretary of Finance written under the caption text.) The large one has text running under the seal. It's a fair bet that the purpose of this test run was to identify those problems, and that they won't be an issue in production.

So, really, just two types of chryons. I think the best test will be to check for the presence of two blue pixels. If the bottom left pixel is blue, then it's a short chyron, and crop accordingly. If it isn't, but if a pixel above and to the left is blue, then it's a large chyron and, again, crop accordingly.

I'm dubious that the format of the top text is established at this point, but it should be pretty easy to extract. Bill number and patron. The bottom text could be useful as a sanity check, in case of an OCR error for the top text, since that's the bill's catch line.

waldoj commented 6 years ago

Started to support House committee chyrons in 670890b.

waldoj commented 6 years ago

Huh. Here's a completely different approach to chyron-text placement, from today's test video. (There's no real video just yet.)

chyron

waldoj commented 6 years ago

Looks like the tick-tock can be grabbed from the page source itself, defined as dataModel.

waldoj commented 6 years ago

Senate video lives here.

waldoj commented 6 years ago

Oh, lawd...new chyron styles for the Senate.

senate

The bill text is all stretchy, the chyrons are smaller, and the video is flipped horizontally, for some reason? Ugh.