mueslimak3r / tv-intro-detection

This project tries to detect intros of tv series by comparing pairs of episodes to find the largest common subset of frames.
https://mueslimak3r.github.io/tv-intro-detection/
GNU General Public License v3.0
81 stars 3 forks source link

timestamps sometimes incorrect and alternate between accurate and incorrect #12

Closed mueslimak3r closed 2 years ago

mueslimak3r commented 2 years ago

Copied discussion from #7 by @fallenbagel

Upon testing what I could for now, which is batman: the animated series, its now detecting the intro to start at 0:05:36 and end at 0:06:41 for some episodes whereas the actual intro starts at 0:00:00 and ends around 0:01:05. At first I thought it was just one mistake but it seems like every other episode is being misidentified thinking (some episodes seems to be identified correctly). I checked out the timestamps and there doesn't seem to be any identical footage there. Let me run the script again to confirm whether the same thing happens.

EDIT: nope. Same thing happening I wonder why thinking Its literally doing it alternatively. 1st episode has the wrong, 2nd episode has the correct one, 3rd wrong, 4th correct, 5th wrong, 6th correct.....etc

Season 1

162330894-0349f843-0bfd-4418-92ed-dd7477ccf136

mueslimak3r commented 2 years ago

with S01E01 & E02 as examples, is it reprocessing either of those?

I've seen this before too where it alternates between two almost identical times, but it was when testing some code that didn't end up working.

mueslimak3r commented 2 years ago

I pushed some changes that fix videos not being skipped if they can't be found. Perhaps the changes fix this too?

Fallenbagel commented 2 years ago

with S01E01 & E02 as examples, is it reprocessing either of those?

Do you mean when I reran from scratch? if so it seems to still end up doing the same thing. Alternating

I pushed some changes that fix videos not being skipped if they can't be found. Perhaps the changes fix this too?

Let me test it out. I'm gonna run without rclone media and test whether it happens on other series then run again with rclone media to test whether it only happens in batman. I'll get back to you in like an hour

mueslimak3r commented 2 years ago

To test specific seasons/folders you can run decode.py directly

Fallenbagel commented 2 years ago

To test specific seasons/folders you can run decode.py directly

Wouldn't that not create a json file which stores the timestamps? 🤔

mueslimak3r commented 2 years ago

To test specific seasons/folders you can run decode.py directly

Wouldn't that not create a json file which stores the timestamps? 🤔

Yeah it won't make json. But with the -d (debug logging) or -l (debug logging to file) flags you can still get logs to tell if it's working

Fallenbagel commented 2 years ago

Yeah it won't make json. But with the -d (debug logging) or -l (debug logging to file) flags you can still get logs to tell if it's working

OH you mean the timestamps?

Also I tested on a few episodes of fresh prince. That alternate thing did not happen (although it did misidentify a few where one was way off, the other finished too early and the other lasted 1 second) so let me test on batman and see if the alternative thing happens again. image

EDIT: Unfortunately same outcome. Could be a series specific issue where the script failed to identify the intro in the first episode which lead to a series of alternating wrong detections? image

Although the weird thing is this happens in all seasons of batman tho it has one very specific intro always at the start of the episodes

mueslimak3r commented 2 years ago

I have Batman The Animated Series so I'll see if I get the same results.

Regarding the results from the fresh prince, it looks like that output isn't from the final result. S01E04 has start 0:00:00 end 0:00:01 which should only show prior to the "error correction" stage.

If the final result had a duration that short it should show something like this:

Screen Shot 2022-04-07 at 11 05 21 PM

If you could post the full log for both of those that would be helpful

mueslimak3r commented 2 years ago

I found the fix and it's really dumb.

This line: https://github.com/mueslimak3r/tv-intro-detection/blob/ccbc50f9060ea2ddf67bd476cd9285a3a8aa8c06/decode.py#L105

k started at 0 which meant that while it was intended to search back to front so it would be "total # of frames - k frames", doing [-k:] when k is 0 was actually [0:] (front to back). By changing the loop to start at 1 instead, it now searches back to front where it should.

The reason why it was alternating was that the last two parameters (..., len(print1) / 16) - k, 0) assume that the [-k:] thing is back to front. The matching was working properly but the offsets used to figure out where the matched frames are was wrong for the first episode in each pair.

mueslimak3r commented 2 years ago

I did test against Fresh Prince of Bel Air season 1 and it's almost completely spot on. All but 3 episodes were correct, and the remaining ones were invalidated (set to 0-0 so they wouldn't cause any skipping).

The issue with those is a fairly common issue where the average duration of the matched intro is slightly shorter than the real duration, and a few episodes match the full duration of the intro so are accurate, but are still rejected due to being too far from the average.

So for instance episode 1 & 2 were correctly matched but were too far from the average and rejected.

Overall, having a few rejected is okay for now, and is much preferable to incorrect timestamps that cause skipping random parts of a video

mueslimak3r commented 2 years ago

The other thing about batman is that since season one has 60 episodes, it's skipped by default since it has more than the 30 episode cap that's default. It's adjustable in the code rn but I may up that number and/or make it adjustable via an environment variable or cli parameter

Fallenbagel commented 2 years ago

The other thing about batman is that since season one has 60 episodes, it's skipped by default since it has more than the 30 episode cap that's default. It's adjustable in the code rn but I may up that number and/or make it adjustable via an environment variable or cli parameter

Oh I have the dvd version so only 29 episodes per season. I tested batman last night. It was spot on! I upped the number back when it was only 20 limit but I think 30 is good? 🤔

Fallenbagel commented 2 years ago

I did test against Fresh Prince of Bel Air season 1 and it's almost completely spot on. All but 3 episodes were correct, and the remaining ones were invalidated (set to 0-0 so they wouldn't cause any skipping).

The issue with those is a fairly common issue where the average duration of the matched intro is slightly shorter than the real duration, and a few episodes match the full duration of the intro so are accurate, but are still rejected due to being too far from the average.

So for instance episode 1 & 2 were correctly matched but were too far from the average and rejected.

Overall, having a few rejected is okay for now, and is much preferable to incorrect timestamps that cause skipping random parts of a video

I havent tested fresh prince yet I'll test and let you know. Since the last update it's been going real good so far. Very accurate (esp batman)