tfabris / CrowCam

A set of Bash scripts to control and maintain a YouTube live cam from a Synology NAS.
GNU General Public License v3.0
4 stars 3 forks source link

YouTube stream went down, but did not bounce automatically, on 2019-04-01. #9

Closed tfabris closed 5 years ago

tfabris commented 5 years ago

On 2019-04-01 at 6:53am, the YouTube stream stopped working, in a strange way. The following behavior was visible when trying to watch the main YouTube stream:

To do:

tfabris commented 5 years ago

I have checked in a commit which attempts to address issues #8 and #9 . Will test to make sure these are working correctly in production before I close these bugs.

Note that I made the changes under the assumption that this was the root cause of the issue:

  • See if the problem in the logs show that it is due to a network drop which thinks there is "nothing to do" because it thinks "the feed is still up" due to the inner loop detecting StreamIsUp=True. (Stream acted like it was "up" when I tried to look at it, even though it was not.)

I haven't yet looked at the logs to be sure.

tfabris commented 5 years ago

Same issue recurred today, 2019-04-01, at 3:45 pm. This is before I had a chance to put the new code in place, and before I had a chance to review the logs from 6:53 am. So review logs from both time periods. Note that I rebooted the camera at about 4:40 ish PM in an attempt to fix the issue.

tfabris commented 5 years ago

Viewed the logs and they were unhelpful. Though it's still possible that some of my changes might remotely possibly fix the issue, the logs didn't say what I expected them to say. Instead they said this:

Info
System
2019/04/01 16:45:45
SYSTEM
CrowCam Controller - Live stream came back up.
Info
System
2019/04/01 16:45:20
SYSTEM
CrowCam Controller - The YouTube stream was down. Pausing to give it a chance to come up. Retry number 2. Sleeping 8 seconds before trying again.
Info
System
2019/04/01 16:45:00
SYSTEM
CrowCam Controller - The YouTube stream was down. Pausing to give it a chance to come up. Retry number 1. Sleeping 8 seconds before trying again.
Info
System
2019/04/01 13:20:08
SYSTEM
CrowCam Controller - Live stream came back up.
Info
System
2019/04/01 13:19:42
SYSTEM
CrowCam Controller - The YouTube stream was down. Pausing to give it a chance to come up. Retry number 1. Sleeping 8 seconds before trying again.
Info
System
2019/04/01 06:33:05
SYSTEM
CrowCam Controller - We are after our sunrise/start time. YouTube Live Broadcast is down. It should be up at this time. Starting stream.

Near as I can figure, something hiccuped either at the network side or the youtube side and made the stream glitch, and the script didn't pick up on it and we still had the problem.

One of my changes is to increase the frequency of the network tests, from once every 20 seconds to once every 10 seconds. I'm wondering if this would have caught the issue?

I just don't know because I don't see anything in the logs other than that the stream went down and then it came back up again very shortly thereafter, making my script not bounce the stream. This is as it should be because the stream tester needs that hysteresis or else it false alarms and bounces the stream when it's not needed.

Stumper.

tfabris commented 5 years ago

Issue recurred on 2019-04-03 at approx 12:58pm. Rebooted at 1:28 pm.

Consider possibilities:

tfabris commented 5 years ago

On 04-03, the recurrence of the issue at 12:58 showed zero entries in the log. The log went from turning on the camera just before 7am to me rebooting the camera at 1:28pm, nothing in between, in either the main log or the survstation log.

There was, however, a very brief network problem at 1:45pm and I do recall the stream being flaky at that time too.

So I think what I'm looking at here is really brief network blip issues on my Comcast cable line which affect the stream. Some of them so short that my every-ten-seconds network test is still not frequent enough.

Thinking about this.

tfabris commented 5 years ago

Tried: Reducing the camera streaming speed from 3mbps back down to 2mbps where it originally was. I don't remember having these troubles until after I went to 3mbps so maybe this alone is enough?

tfabris commented 5 years ago

Also tried: Fixed a bug where, when I increased the network-checking frequency, I had originally missed the last of the six checks per minute due to an off-by-one error in my setting. (I had set it to do 5 times per minute, intending to perform the last check at 0:50, failing to remember that the first check is at 0:00 on the clock so 5 checks end at 0:40 instead of the intended 0:50.)

So the current behavior is:

To do:

tfabris commented 5 years ago

To do: Also test what happens in Synology when the script isn’t finished and then the next minute interval rolls around.

I already know that it doesn’t double-run the script. I already tested that.

But then when the script is finished, then what happens?

If the latter, then we have a problem where the network tests can have holes. The hole will be as long as the gap between scriptend and nextminutestart.

If true figure out how to address this.

tfabris commented 5 years ago

Interesting!

The answer to the question is:

But, in testing this, I discovered that my current version of CrowCam.sh, when running on the Synology NAS, does not execute very fast. It takes up to 2.5 minutes total to complete its job. So it's not checking the network as frequently as I think it is.

To do:

tfabris commented 5 years ago

The slowness is in the "Test_Stream" function. Sometimes YouTube-dl returns a result instantly, other times it can take as much as 20 seconds to return a result.

Is YouTube throttling the connection because I'm checking the stream too frequently?

To do: Perhaps do not check the stream every time we check the network. Perhaps only check the stream once, at the top of the minute. This will require a complete code refactor.

tfabris commented 5 years ago

Redid the way it tests the stream. Instead of trying to test every time through the network testing loop, it now checks the stream only once per run, only after all the loops of the network tests are completed and returned good results.

Code is checked in and bench tested, but has not been tested in production yet.

Leave this card open until it's been proven good in production.

tfabris commented 5 years ago

Code is working in production and I haven't had a repeat of the issue. Closing card unless I see a recurrence.