tfabris / CrowCam

A set of Bash scripts to control and maintain a YouTube live cam from a Synology NAS.
GNU General Public License v3.0
4 stars 3 forks source link

Live feed hangs and gets "reload" icon sometimes. #23

Closed tfabris closed 5 years ago

tfabris commented 5 years ago

Sometimes when viewing the live feed from my desk at work, it will hang in the following way:

Consider:

To do:

tfabris commented 5 years ago

I looked at the logs. 2019-05-03 11:25am contains one entry of:

Historical records:

Conclusions:

tfabris commented 5 years ago

Did it again today.

Death time: 2019-05-04 10:30:09am Discovered and rebooted camera: 5:10pm Nothing in Surveillance Station logs except for my reboot. Nothing in Synology logs this time (there were some blips at 11am but nothing at 10:30). Rebooting the camera fixed it.

tfabris commented 5 years ago

Similar issue just happened 2019-05-05 6:24pm:

So, whatever is causing this shit, it's an indirect result of a flaky network.

tfabris commented 5 years ago

Possible cause: Torrey started uploading videos from her iPad around that time. Maybe a saturated upstream channel could cause this?

tfabris commented 5 years ago

It took several tries of rebooting the camera and the live stream to get things to come back and work. Until then it just kept either going to "No Stream" or "Spinny death stream".

tfabris commented 5 years ago

Trying (wondering if this may also help issue #14):

tfabris commented 5 years ago

Issue recurred at approx 6:34 am on Wed 2019-05-08 (spinny death icon with camera displaying that time stamp). Discovered the issue and rebooted camera at approx 7:39 am. After that, the stream didn't come back cleanly, and also the camera got unexpectedly reset to 1080p/5mbps instead of my desired 1296p/3mbps setting. It required resetting the camera, bouncing my local routers, and multiple automatic restarts of the stream by CrowCam.sh. In the end the stream didn't come back until I fixed the stream key, which had gotten reset due to issue #24.

Nothing odd in the surveillance station logs. Synology logs show that CrowCam.sh did not see any network blip and did not take any actions. Synology logs show that CrowCam.sh saw a stream blip, at 6:23:34 am, but that it came back up again 30 seconds later.

So changing the transport protocol to TCP didn't solve the issue.

tfabris commented 5 years ago

And idea to fix this (assuming that it's just the temporary spinnydeath and not a full stream key change) would be to have CrowCam.sh detect when the last update was made to the live stream, and bounce the stream if it's been too long since the last update. There might be a way to do this via the YouTube API. We're looking specifically for the variable "boundStreamLastUpdateTimeMs" in the YouTube API. Please see issue #26 for a possible way to handle this programmatically.

tfabris commented 5 years ago

boundStreamLastUpdateTimeMs was not the correct way to check this. However there may be some other API calls we can use for this. For instance look at LiveStreams: https://developers.google.com/youtube/v3/live/docs/liveStreams#resource

"status": {
    "streamStatus": string,
    "healthStatus": {
      "status": string,
      "lastUpdateTimeSeconds": unsigned long,
      "configurationIssues": [
        {
          "type": string,
          "severity": string,
          "reason": string,
          "description": string
        }
      ]
    }
  },
tfabris commented 5 years ago

The actual output looks like this, and it seems that "lastUpdateTimeSeconds" for the healthStatus is not being sent any more. It's in the docs but I'm not seeing it in the output.

However the "status" of "healthStatus" object could maybe be useful? I need to run the WIP test code (StreamStatus-WIP.sh) the next time I see a spinny death hang and try to find out if something useful is displayed here when the stream is having problems.

{
 "kind": "youtube#liveStreamListResponse",
 "etag": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
 "pageInfo": {
  "totalResults": 0,
  "resultsPerPage": 5
 },
 "items": [
  {
   "kind": "youtube#liveStream",
   "etag": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
   "id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
   "status": {
    "streamStatus": "active",
    "healthStatus": {
     "status": "ok",
     "configurationIssues": [
      {
       "type": "videoResolutionSuboptimal",
       "severity": "warning",
       "reason": "Check resolution",
       "description": "Please check the video resolution. The current resolution is (2304x1296), which is not optimal."
      },
      {
       "type": "audioBitrateLow",
       "severity": "info",
       "reason": "Check audio settings",
       "description": "The audio stream's current bitrate (0) is lower than the recommended bitrate. We recommend that you use an audio stream bitrate of 128 Kbps."
      },
      {
       "type": "audioSampleRate",
       "severity": "info",
       "reason": "Check audio settings",
       "description": "The current sample rate is 0. The recommended sample rates are 44.1kHz and 48kHz."
      }
     ]
    }
   }
  }
 ]
}.
tfabris commented 5 years ago

I have working code in place which is hoping that it will help this. Need to see if this lives for a long time without issue, and if so, we can close this bug. Give it a few weeks.

Currently: The code is using the "status" field of the "status" part of the "LiveStreams" resource, and also the "healthStatus" field, as described here: https://developers.google.com/youtube/v3/live/docs/liveStreams#resource Those seem to be the only two things that give me the status of the actual stream. I can't find anything else.

The code currently also uses YouTube-DL to check if the stream is up, too. A failure of either one (API or YouTube-DL) will be considered a stream failure.

If you ever get a situation where:

tfabris commented 5 years ago

The code ran in production for only a day, and as the end of the day approached, it busted a 10,000-a-day quota limit on the YouTube API.

I have removed the code which queried for the stream status. Shame, because it seemed to work so well.

Some ideas for how to fix this:

Note: Look at your Google Developer API account to monitor the quota usage in a given day, there's a "quota" tab on the left side: https://console.developers.google.com/apis/api/youtube.googleapis.com/quotas?project=crowcam

tfabris commented 5 years ago

Checked in new code: Attempt to address issues #23 and #29:

Need to test this on production system and see how it does for a day.

tfabris commented 5 years ago

Happened really badly on 2019-05-18 at 8:14:21 am.

For the entire day until I came home it was stuck with spinny death at 8:14:21am showing a short clip. Later, after an unusual amount of time spend on the "processing..." stage, the final archived video goes from startup to 8:14am and then terminates. It is this video: https://youtu.be/EtU5Dvnaj-g?t=10701

Live stream health error read "Bad: Check Video Settings - Your encoder is sending data faster than realtime (multipleseconds of video each second). You must rate limit your livevideo upload to approximately 1 second of video each second."

Continuing to investigate this issue. Synology log info coming up in the next comment.

tfabris commented 5 years ago

Synology log shows that we had a "stream down" at 7:20 am and at 8:43 am, once each, but then the stream came back up immediately after each of those checks according to the script. Neither of those times coincides with the 8:14am hang.

At 9:00 am it got an error where it couldn't get boundStreamId: CrowCam Controller - The liveBroadcastOutput was { "error": { "code": 500, "message": null } } . -- But then there were no more errors after that.

The stream key/name has not changed, it is still the same thing at this time as has been since 2019-05-08 (1.5 weeks so far without a stream key clobber event).

When I run the CrowCam.sh locally in debug mode, the output looks perfectly fine.

However the "encoder sending data faster than realtime" thing is indeed showing up in the YouTube API status output, so I'm going to try triggering on that:

  "configurationIssues": [
      {
       "type": "videoIngestionFasterThanRealtime",
       "severity": "error",
       "reason": "Check video settings",
       "description": "Your encoder is sending data faster than realtime (multipleseconds of video each second). You must rate limit your livevideo upload to approximately 1 second of video each second."
      },
tfabris commented 5 years ago

The most recent checkin did indeed fix the issue when I ran a local debug version of the script. It detected the string "videoIngestionFasterThanRealtime" multiple times in a row and bounced the stream. Afterwards, the stream came back up and seemed steady for at least the 10 minutes or so that I observed it.

To do: Look at the stream archive and the Synology logs tomorrow and see if everything worked as expected after the bounce. The bounce was completed successfully at 2019-05-18 5:10 pm.

Look for:

tfabris commented 5 years ago

Also note:

Well I noticed that some of my messages were stiil "dbg" level so I'm not 100 percent certain about this. I see that today (2019-05-18), during the approximate "problem" time, there were two detections of a problem by YouTube-DL, and zero detections of a problem by the YouTube API. I can tell because the YouTube API would have printed some "err" messages, and those didn't print. I only got the "stream is down" messages that were the result of failure detections by YouTube-DL.

HOWEVER... Both of those detections by YouTube-DL were within the Hysteresis noise floor, and did not result in a stream bounce.

So my conclusion so far is that YouTube-DL is either:

Not sure about the conclusion yet. I'll let the script run for a while more with the YouTube-DL code printing out "err" messages if it encounters a problem, and see how we do long term.

tfabris commented 5 years ago

I looked at the camera settings after the recurrence of issue #23 and found out:

tfabris commented 5 years ago

Had a situation where the stream went down today and the script properly recovered, first with the old code then with the new code. It went like this:

Relevant log entries below.

Error System 2019-05-19 13:14:59 System CrowCam Controller - The streamStatus is not active. Value retrieved was: inactive.
Error System 2019-05-19 13:14:59 System CrowCam Controller - The healthStatus is not good. Value retrieved was: noData.
Error System 2019-05-19 13:15:12 System CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl finished with exit code 1, live stream is down.
Error System 2019-05-19 13:15:12 System CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl returned an error message string, live stream is down.
Info  System 2019-05-19 13:15:12 System CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 3 . Sleeping 30 seconds before trying again.
Error System 2019-05-19 13:15:43 System CrowCam Controller - The streamStatus is not active. Value retrieved was: inactive.
Error System 2019-05-19 13:15:44 System CrowCam Controller - The healthStatus is not good. Value retrieved was: noData.
Error System 2019-05-19 13:15:55 System CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl finished with exit code 1, live stream is down.
Error System 2019-05-19 13:15:55 System CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl returned an error message string, live stream is down.
Info  System 2019-05-19 13:15:55 System CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 3 . Sleeping 30 seconds before trying again.
Error System 2019-05-19 13:16:26 System CrowCam Controller - The streamStatus is not active. Value retrieved was: inactive.
Error System 2019-05-19 13:16:26 System CrowCam Controller - The healthStatus is not good. Value retrieved was: noData.
Error System 2019-05-19 13:16:38 System CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl finished with exit code 1, live stream is down.
Error System 2019-05-19 13:16:38 System CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl returned an error message string, live stream is down.
Error System 2019-05-19 13:16:38 System CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.
Info  System 2019-05-19 13:16:39 System CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.
Info  System 2019-05-19 13:18:14 System CrowCam Controller - Done bouncing YouTube stream.
Error System 2019-05-19 13:24:18 System CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  System 2019-05-19 13:24:32 System CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 3 . Sleeping 30 seconds before trying again.
Error System 2019-05-19 13:25:04 System CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  System 2019-05-19 13:25:17 System CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 3 . Sleeping 30 seconds before trying again.
Error System 2019-05-19 13:25:48 System CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Error System 2019-05-19 13:26:04 System CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.
Info  System 2019-05-19 13:26:05 System CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.
Info  System 2019-05-19 13:27:41 System CrowCam Controller - Done bouncing YouTube stream.

This makes me want to try it without using YouTube-DL as the stream-upness-checker, and rely entirely on the API to decide on stream upness.

tfabris commented 5 years ago

I'm going to tentatively comment out the YouTube-DL code as it relates to stream-upness-checking. However, before checking it into production, look at the Synology logs for 2019-05-19, there are at least three stream bounces on that day (the one cited in the prior comment and at least two others as well). Check to see if these are legit, and that the logs look correct for them.

tfabris commented 5 years ago

Interesting note about the video archive of the stream with the bounce on 2019-05-19 at 1:18 pm and then again at 1:26 pm (log entries are a couple of comments up in this thread):

Conclusions from this:

To do:

tfabris commented 5 years ago

The remaining two bounces on 2019-05-19 are looking legit, and in addition, the API checker code caught the issue while the YouTube-DL checker didn't catch it.

The stream died at 2019-05-19 at 3:53:12 pm, and the API checker code started complaining at 3:59:55 pm, while the YouTube-DL code sat there doing nothing. The bounce happened at 4:01:50 pm. Then the stream went down at 6:02:03 pm, and the log shows that the API checker code started complaining at 6:00:12 pm and performed the bounce at 6:02:16 pm.

Based on this, I will continue with my plan of removing the YouTube-DL code from CrowCam.sh (leaving it in CrowCamKeepAlive.sh).

Logs from the later two bounces:

Info  SYSTEM 2019-05-19 14:05:56 SYSTEM CrowCam Controller - Retrieved sunrise/sunset times from Google: 5:26 AM/8:46 PM.
Error SYSTEM 2019-05-19 15:59:55 SYSTEM CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  SYSTEM 2019-05-19 16:00:12 SYSTEM CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 3 . Sleeping 30 seconds before trying again.
Error SYSTEM 2019-05-19 16:00:43 SYSTEM CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  SYSTEM 2019-05-19 16:01:01 SYSTEM CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 3 . Sleeping 30 seconds before trying again.
Error SYSTEM 2019-05-19 16:01:33 SYSTEM CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Error SYSTEM 2019-05-19 16:01:50 SYSTEM CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.
Info  SYSTEM 2019-05-19 16:01:51 SYSTEM CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.
Info  SYSTEM 2019-05-19 16:03:26 SYSTEM CrowCam Controller - Done bouncing YouTube stream.
Error SYSTEM 2019-05-19 16:04:53 SYSTEM CrowCam Keep Alive - /volume1/homes/admin/CrowCam/youtube-dl finished with exit code 1.
Error SYSTEM 2019-05-19 16:04:53 SYSTEM CrowCam Keep Alive - No final URL was obtained from /volume1/homes/admin/CrowCam/youtube-dl. Exiting program.
Error SYSTEM 2019-05-19 18:00:12 SYSTEM CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  SYSTEM 2019-05-19 18:00:30 SYSTEM CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 3 . Sleeping 30 seconds before trying again.
Error SYSTEM 2019-05-19 18:01:03 SYSTEM CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  SYSTEM 2019-05-19 18:01:23 SYSTEM CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 3 . Sleeping 30 seconds before trying again.
Error SYSTEM 2019-05-19 18:01:56 SYSTEM CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Error SYSTEM 2019-05-19 18:02:16 SYSTEM CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.
Info  SYSTEM 2019-05-19 18:02:17 SYSTEM CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.
Info  SYSTEM 2019-05-19 18:03:53 SYSTEM CrowCam Controller - Done bouncing YouTube stream.
Info  SYSTEM 2019-05-19 21:06:03 SYSTEM CrowCam Controller - We are after our sunset/stop time. YouTube Live Broadcast is up. It should be down at this time. Stopping stream.
Info  SYSTEM 2019-05-20 00:00:06 SYSTEM CrowCam Cleanup - 50 videos found. Processing.
Info  SYSTEM 2019-05-20 05:15:04 SYSTEM CrowCam Controller - We are after our sunrise/start time. YouTube Live Broadcast is down. It should be up at this time. Starting stream.
Error SYSTEM 2019-05-20 08:24:04 SYSTEM CrowCam Keep Alive - /volume1/homes/admin/CrowCam/youtube-dl finished with exit code 1.
Error SYSTEM 2019-05-20 08:24:04 SYSTEM CrowCam Keep Alive - No final URL was obtained from /volume1/homes/admin/CrowCam/youtube-dl. Exiting program.
Error SYSTEM 2019-05-20 08:24:24 SYSTEM CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl finished with exit code 1, live stream is down.
Error SYSTEM 2019-05-20 08:24:24 SYSTEM CrowCam Controller - /volume1/homes/admin/CrowCam/youtube-dl returned an error message string, live stream is down.
Info  SYSTEM 2019-05-20 08:24:24 SYSTEM CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 3 . Sleeping 30 seconds before trying again.
Info  SYSTEM 2019-05-20 08:25:12 SYSTEM CrowCam Controller - Live stream came back up.
Info  SYSTEM 2019-05-20 14:03:30 SYSTEM CrowCam Controller - Retrieving sunrise/sunset times from Google.
Info  SYSTEM 2019-05-20 14:03:32 SYSTEM CrowCam Controller - Retrieved sunrise/sunset times from Google: 5:25 AM/8:47 PM.
tfabris commented 5 years ago

To do:

tfabris commented 5 years ago

Added code to detect the "VideoIngestionStarved" state. For now it will just log if it detects that error.

To do:

tfabris commented 5 years ago

https://www.youtube.com/watch?v=ngEd5ljc9b4 Video runs from 2019-05-22 6:01:43 pm to 8:27:24 pm (baked-in camera clock on the video stream), and the 8:27:24 was due to a failure bounce.

(The video started correctly at 6:01pm because that was when I first installed the midday-bounce code onto the production server and it correctly did a bounce as expected at that time.)

Logs show a bounce at 8:27:39 due to 3 repeats of "videoIngestionFasterThanRealtime" error starting at 8:26:35 pm.

Error 2019-05-22 20:26:35 CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  2019-05-22 20:26:36 CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 3 . Sleeping 30 seconds before trying again.
Error 2019-05-22 20:27:07 CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  2019-05-22 20:27:07 CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 3 . Sleeping 30 seconds before trying again.
Error 2019-05-22 20:27:39 CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Error 2019-05-22 20:27:39 CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.
Info  2019-05-22 20:27:40 CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.
Info  2019-05-22 20:29:15 CrowCam Controller - Done bouncing YouTube stream.

Looking at the baked-in seconds counter on the video stream archive, I see that it counts the seconds just fine with no flaws during this entire time, all the way up to 8:27:24 pm. The bounce occurs at 8:27:39 pm, and it's a known thing that the video is delayed about 15 seconds, and 24+15=39 so it looks like the video feed was just fine all the way up to the last second.

The subsequent video clip plays up until EOD, from 8:29:17 9:06:11 pm, for a correct length of 36:53, with no problems and no log errors. The startup time of 8:29:17 corresponds very closely with the log entry which says it's starting the video around that time.

So I consider the "videoIngestionFasterThanRealtime" test to be flaky, tends to cry wolf, and should be adjusted. But how?

To do:

Do not:

tfabris commented 5 years ago

I have checked in code that tries one additional hysteresis loop and extends the time length between each loop a bit. See if that addresses it? I'm not optimistic.

To do:

tfabris commented 5 years ago

Aha!

Looking back at day-before-yesterday's stream, the "videoIngestionFasterThanRealtime" check is indeed a good check. Here is an example of when it correctly solved a problem. It still might cry wolf sometimes, but this time, it wasn't a wolf cry, it was a genuine solve of the problem:

Video: 2019-05-21 https://youtu.be/Ne1XrvA0Y4Q

Video plays fine for most of the day up until 6:13:08 pm, at which point it starts jumping and fast forwarding from 6:13:09 pm up until 6:17:something pm at which point it ends due to a bounce.

Here is the log of that bounce:

Error 2019-05-21 18:19:10 CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime
Info  2019-05-21 18:19:10 CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 3 . Sleeping 30 seconds before trying again.
Error 2019-05-21 18:19:42 CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Info  2019-05-21 18:19:42 CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 3 . Sleeping 30 seconds before trying again.
Error 2019-05-21 18:20:13 CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.
Error 2019-05-21 18:20:13 CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.
Info  2019-05-21 18:20:15 CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.
Info  2019-05-21 18:21:50 CrowCam Controller - Done bouncing YouTube stream.

The log didn't catch the issue until about six minutes after the issue started occurring. Possibly because it doesn't check for the problem every moment, it only starts checking for the problem about once every five minutes. That explains the gap where it didn't notice the issue right away. But once it started checking, boy howdy did it catch a valid issue and did a valid bounce.

The subsequent video stream after the bounce is: https://youtu.be/jLAipmnPNnc It starts at 6:21:53 pm and it's playing perfectly fine at one second per second up until EOD.

So yeah, our check for videoIngestionFasterThanRealtime is provably good and needs to stay in place.

Remaining to do:

tfabris commented 5 years ago

So far this morning there has been one non-repeating "blip" of the videoIngestionStarved message:

2019-05-23 05:10:03 CrowCam Controller - We are after our sunrise/start time. YouTube Live Broadcast is down. It should be up at this time. Starting stream.
2019-05-23 05:15:30 CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.

That tells me a few things:

Remaining to do:

tfabris commented 5 years ago

I've gotten a couple of single blips of "videoIngestionStarved" (which were eaten up by the hysteresis), and they happened immediately after a stream startup or a stream bounce. I have increased the pausetime after these, to see if it makes the blip go away.

tfabris commented 5 years ago

This video from 2019-05-25 shows a successful bounce and fix: https://www.youtube.com/watch?v=0jC2Yh1F1bA

The video on the YouTube site (previewed before processing was done) lasts until 5:47:06 pm and then ends. The YouTube API checks caught it and successfully bounced the stream. The errors which were caught and fixed were "inactive" and "noData".

Error 2019-05-25 17:50:07 CrowCam Controller - The streamStatus is not active. Value retrieved was: inactive.
Error 2019-05-25 17:50:07 CrowCam Controller - The healthStatus is not good. Value retrieved was: noData.
Info  2019-05-25 17:50:07 CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 4 . Sleeping 40 seconds before trying again.
Error 2019-05-25 17:50:48 CrowCam Controller - The streamStatus is not active. Value retrieved was: inactive.
Error 2019-05-25 17:50:48 CrowCam Controller - The healthStatus is not good. Value retrieved was: noData.
Info  2019-05-25 17:50:48 CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 4 . Sleeping 40 seconds before trying again.
Error 2019-05-25 17:51:30 CrowCam Controller - The streamStatus is not active. Value retrieved was: inactive.
Error 2019-05-25 17:51:31 CrowCam Controller - The healthStatus is not good. Value retrieved was: noData.
Info  2019-05-25 17:51:31 CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 3 of 4 . Sleeping 40 seconds before trying again.
Error 2019-05-25 17:52:12 CrowCam Controller - The streamStatus is not active. Value retrieved was: inactive.
Error 2019-05-25 17:52:12 CrowCam Controller - The healthStatus is not good. Value retrieved was: noData.
Error 2019-05-25 17:52:12 CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.
Info  2019-05-25 17:52:17 CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.
Info  2019-05-25 17:54:03 CrowCam Controller - Done bouncing YouTube stream.

Remaining to do:

tfabris commented 5 years ago

Got more interesting data on 2019-05-27 about how the API checks are working!

The TLDR is:

Full details of each of the events that occurred on 2019-05-27, showing the results:

https://www.youtube.com/watch?v=AyQpoMPtIos 2019-05-27 - Start 5:05 am - End 1:10 pm - Video length correct at 8:04:58 - Normal midday bounce with no video problems.

https://www.youtube.com/watch?v=KZilVm-rVUY 2019-05-27 - Start 1:12:39 pm - End 1:19:40 pm - Video length correct at 7:01 - Was bounced due to "videoIngestionStarved" message but there didn't seem to be anything visibly wrong with the video.

https://www.youtube.com/watch?v=kPM7tlhl9qg 2019-05-27 - Start 1:21:29 pm - End 1:29:20 pm - Video length correct at 7:52 - Was bounced due to "videoIngestionStarved" message but there didn't seem to be anything visibly wrong with the video.

https://www.youtube.com/watch?v=MFHTY5lFBRQ 2019-05-27 - Start 1:31:09 pm - End 1:39:10 pm - Video length is correct at 8:02 - Was bounced due to "videoIngestionStarved" message but there didn't seem to be anything visibly wrong with the video.

NOTE: Starting shortly before 6pm ish, there was noticeable network problems with The Agora internet. Certain web browsing things were working poorly. Vixy was trying to make holiday reservations on some web sites and the requests were timing out and being stuck. I think the bounces below are due to these valid network problems.

https://www.youtube.com/watch?v=I-YjaUcbEA0 2019-05-27 - Start 1:40:57 pm - Video length is wrong, end of video is messed up - At around 5:44:50 pm the video seems to freeze and then at 5:45 pm the video does the "speed up at the end" thing where the timestamps zoom by really fast and then the video ends. This is a valid bounce that detected the problem as "videoIngestionFasterThanRealtime" and repaired it. Initial detection was at 5:46 pm and the repair bounce correctly occurred at 5:49 pm.

https://www.youtube.com/watch?v=5ctLDEwlp7c 2019-05-27 - Start 5:50:52 pm - End 5:58:35 pm - Video length is correct at 7:39 - Was bounced due to "videoIngestionFasterThanRealtime" but there didn't seem to be anything visibly wrong with the video. Expectation: This was still a valid bounce because there were still network problems happening at this time, and the "videoIngestionFasterThanRealtime" message, though not reflected in the video itself, may have still been occurring due to this, and so I think the bounce is something that should have been done anyway.

https://www.youtube.com/watch?v=FrNiXFWTl9s 2019-05-27 - Start 6:00:37 pm - End 6:09:27 pm - Video length is wrong at 7:52 - Was bounced due to two reasons, "videoIngestionStarved" and because the access token, streamStatus, and healthStatus were all empty, due to a "Login Required" error. The bounce is valid, the video does the "speed up at the end" thing where the timestamps zoom by really fast and then the video ends. There appeared to be something bad happening to the network here, still, so that the API logins were failing just like Vixy's holiday reservations were failing. The output from the login attempt looked like this:

{ "error": { "errors": [ { "domain": "global", "reason": "required", "message": "Login Required", "locationType": "header", "location": "Authorization" } ], "code": 401, "message": "Login Required" } } .

I think that it's probably OK to re-nerf "videoIngestionStarved" and depend only on the "videoIngestionFasterThanRealtime", "streamStatus", and "healthStatus" errors for problem detection.

https://www.youtube.com/watch?v=HfRMJwGzrjo 2019-05-27 - Start 6:11:14 - End 9:10:03 pm - Video length is correct at 2:58:49 - Normal end of day stream shutdown.

Full log below:

Info   5/27/19 0:00   CrowCam Cleanup - 50 videos found. Processing.    
Info   5/27/19 0:00   CrowCam Cleanup - Title: CrowCam Id: jLAipmnPNnc Date: Tue May 21 18:21:34 PDT 2019 - more than 5 days old. Deleting CrowCam.    
Info   5/27/19 0:00   CrowCam Cleanup - Title: CrowCam Id: 8JXPZzO3wvo Date: Tue May 21 05:17:21 PDT 2019 - more than 5 days old. Deleting CrowCam.    
Info   5/27/19 5:05   CrowCam Controller - We are after our sunrise/start time. YouTube Live Broadcast is down. It should be up at this time. Starting stream.    
Info   5/27/19 13:10  CrowCam Controller - A midday stream bounce is needed, to prevent the current stream segment from exceeding the maximum video length.    
Info   5/27/19 13:10  CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.    
Info   5/27/19 13:11  CrowCam Controller - Done bouncing YouTube stream.    
Error  5/27/19 13:17  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:17  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:18  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:18  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:18  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:18  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 3 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:19  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Error  5/27/19 13:19  CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.    
Info   5/27/19 13:19  CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.    
Info   5/27/19 13:21  CrowCam Controller - Done bouncing YouTube stream.    
Error  5/27/19 13:27  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:27  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:27  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:27  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:28  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:28  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 3 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:29  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Error  5/27/19 13:29  CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.    
Info   5/27/19 13:29  CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.    
Info   5/27/19 13:31  CrowCam Controller - Done bouncing YouTube stream.    
Error  5/27/19 13:37  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:37  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:37  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:37  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:38  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 13:38  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 3 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 13:39  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Error  5/27/19 13:39  CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.    
Info   5/27/19 13:39  CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.    
Info   5/27/19 13:40  CrowCam Controller - Done bouncing YouTube stream.    
Info   5/27/19 14:02  CrowCam Controller - Retrieving sunrise/sunset times from Google.    
Info   5/27/19 14:02  CrowCam Controller - Retrieved sunrise/sunset times from Google: 5:18 AM/8:55 PM.    
Error  5/27/19 17:46  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Info   5/27/19 17:46  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 17:47  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Info   5/27/19 17:47  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 17:48  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Info   5/27/19 17:48  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 3 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 17:49  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Error  5/27/19 17:49  CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.    
Info   5/27/19 17:49  CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.    
Info   5/27/19 17:50  CrowCam Controller - Done bouncing YouTube stream.    
Error  5/27/19 17:56  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Info   5/27/19 17:56  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 17:57  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Info   5/27/19 17:57  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 17:58  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Info   5/27/19 17:58  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 3 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 17:58  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionFasterThanRealtime.    
Error  5/27/19 17:58  CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.    
Info   5/27/19 17:58  CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.    
Info   5/27/19 18:00  CrowCam Controller - Done bouncing YouTube stream.    
Error  5/27/19 18:06  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 18:06  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 1 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 18:07  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Info   5/27/19 18:07  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 2 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 18:08  CrowCam Controller - The variable accessToken came up empty. Error accessing YouTube API.    
Error  5/27/19 18:08  CrowCam Controller - The variable streamStatus came up empty. Error accessing YouTube API.    
Error  5/27/19 18:08  CrowCam Controller - The liveStreamsOutput was { "error": { "errors": [ { "domain": "global", "reason": "required", "message": "Login Required", "locationType": "header", "location": "Authorization" } ], "code": 401, "message": "Login Required" } } .    
Error  5/27/19 18:08  CrowCam Controller - The variable healthStatus came up empty. Error accessing YouTube API.    
Error  5/27/19 18:08  CrowCam Controller - The liveStreamsOutput was { "error": { "errors": [ { "domain": "global", "reason": "required", "message": "Login Required", "locationType": "header", "location": "Authorization" } ], "code": 401, "message": "Login Required" } } .    
Info   5/27/19 18:08  CrowCam Controller - The network is up, but the YouTube stream is down. Pausing to give it a chance to come up. Inner stream test loop, retry attempt 3 of 4 . Sleeping 40 seconds before trying again.    
Error  5/27/19 18:09  CrowCam Controller - The configurationIssues contains a bad value. Value retrieved was: videoIngestionStarved.    
Error  5/27/19 18:09  CrowCam Controller - Bouncing the YouTube stream, because it was unexpectedly down.    
Info   5/27/19 18:09  CrowCam Controller - Pausing, after bringing down the stream, before bringing it up again.    
Info   5/27/19 18:11  CrowCam Controller - Done bouncing YouTube stream.    
Info   5/27/19 21:10  CrowCam Controller - We are after our sunset/stop time. YouTube Live Broadcast is up. It should be down at this time. Stopping stream.    
Info   5/28/19 0:00   CrowCam Cleanup - 50 videos found. Processing.    
Info   5/28/19 0:00   CrowCam Cleanup - Title: CrowCam Id: 1LmycDEBUHM Date: Wed May 22 20:28:49 PDT 2019 - more than 5 days old. Deleting CrowCam.    
Info   5/28/19 0:00   CrowCam Cleanup - Title: CrowCam Id: ngEd5ljc9b4 Date: Wed May 22 18:01:14 PDT 2019 - more than 5 days old. Deleting CrowCam.    
Info   5/28/19 5:05   CrowCam Controller - We are after our sunrise/start time. YouTube Live Broadcast is down. It should be up at this time. Starting stream.    
tfabris commented 5 years ago

This has been doing very well the last several days. Closing bug.