pkscout / metadata.tvshows.thesportsdb.python

Python version of the Kodi scraper for Sports events
GNU General Public License v3.0
1 stars 1 forks source link

Scraper Can't Handle Sports Events on Same Day When Named with Date #1

Closed pkscout closed 2 years ago

pkscout commented 2 years ago

The scraper couldn't handle events for a given league on the same day (like English Premier League and the NFL). I think this is a function of the way the files are named. As an example, the NFL sample files are:

NFL.2021-09-12.Atlanta.Falcons.vs.Philadelphia.Eagles.mkv
NFL.2021-09-12.Buffalo.Bills.vs.Pittsburgh.Steelers.mkv
NFL.2021-09-12.Cincinnati.Bengals.vs.Minnesota.Vikings.mkv
NFL.2021-09-12.Detroit.Lions.vs.San.Francisco.49ers.mkv
NFL.2021-09-12.Houston.Texans.vs.Jacksonville.Jaguars.mkv
NFL.2021-09-12.Indianapolis.Colts.vs.Seattle.Seahawks.mkv
NFL.2021-09-12.Kansas.City.Chiefs.vs.Cleveland.Browns.mkv
NFL.2021-09-12.New.England.Patriots.vs.Miami.Dolphins.mkv
NFL.2021-09-12.New.Orleans.Saints.vs.Green.Bay.Packers.mkv
NFL.2021-09-12.Tennessee.Titans.vs.Arizona.Cardinals.mkv
NFL.2021-09-12.Washington.vs.Los.Angeles.Chargers.mkv
NFL.2022-02-13.Superbowl.Cincinnati.Bengals.vs.Los.Angeles.Rams.mkv

Since TheSportsDB doesn't have any episode numbers, when the scraper gets called by core Kodi with "getepisodelist" it gets a list of every game in each season and then assigns an espisode number as it's putting the episode list (which also includes a season number, air date, and title) in the ListItem.

After that, core Kodi searches the episode list ListItem to find a matching episode. Core Kodi only uses a portion of the file name as defined here:

https://kodi.wiki/view/Naming_video_files/Episodes#Single_Episode_Files

Since the file name doesn't save a season/episode, Kodi next goes to date. But that is really suppose to be for shows that air every day, so when it finds a match, it stops and passes the scraper the show and episode IDs for lookup. That means that the first game on a given day is right, but every one after that gets the first game's information.

The solution with 1.0.1 is to force a modified naming convention and the new option in Kodi Nexus to search by title when the file name ends in .special.ext. In addition, the date has to be digits only so that Kodi doesn't think it's a valid date for scraping purposes. So the test NFL files become:

NFL.20210912.Atlanta.Falcons.vs.Philadelphia.Eagles.special.mkv
NFL.20210912.Buffalo.Bills.vs.Pittsburgh.Steelers.special.mkv
NFL.20210912.Cincinnati.Bengals.vs.Minnesota.Vikings.special.mkv
NFL.20210912.Detroit.Lions.vs.San.Francisco.49ers.special.mkv
NFL.20210912.Houston.Texans.vs.Jacksonville.Jaguars.special.mkv
NFL.20210912.Indianapolis.Colts.vs.Seattle.Seahawks.special.mkv
NFL.20210912.Kansas.City.Chiefs.vs.Cleveland.Browns.special.mkv
NFL.20210912.New Orleans Saints vs Green Bay Packers.special.mkv
NFL.20210912.New.England.Patriots.vs.Miami.Dolphins.special.mkv
NFL.20210912.Tennessee.Titans.vs.Arizona.Cardinals.special.mkv
NFL.20210912.Washington.vs.Los.Angeles.Chargers.special.mkv
NFL.20220213.Cincinnati.Bengals.vs.Los.Angeles.Rams.special.mkv

Note that the last one is the Super Bowl, but you can't have anything extra in the filename for this to work because when searching by title Kodi uses the entire file name for a fuzzy match, and too much extra stuff causes the match to fail. The date is important as well since you could technically have two teams meet multiple times in a season (at least with the NFL that would happen in the playoffs). So the date is the only predictable way to get a unique file name and title.

If doing this name change isn't acceptable, the only other option I can think of is to add an Episode Number to theSportsDB info for an event (by season, so each season would restart at episode 1). Then event files would have to be named as SxxExx.<anything you want>.ext. The date could probably still be in the name, as Kodi would find the season/episode info first and use it.

In either case, file renaming will be needed for sports that have more than one game in a day, and I think it's easier to just drop the dashes from the date and add .special to the end of the file name than to figure out how to get the right season and episode numbers from the site when naming files.

zag2me commented 2 years ago

Thanks for the detailed explanation. I think this is a valid solution and we can produce an example filename on every event page on the website to recomend users do this.

Its a shame kodi core cant do something like look for the ".vs." string in the same way as ".special". But I guess since the date needs to be changed anyway we might as well include ".special" in the filename recomendation.

I manually added the superbowl string in as a curveball and you found it :) That's fine to rename it as a standard event name. I was just trying to add some typical user examples but refering them to the standard seems fine to me.

zag2me commented 2 years ago

image

Team vs Team recommended filename https://www.thesportsdb.com/event/1154271

Event only recommended filename https://www.thesportsdb.com/event/1449681

zag2me commented 2 years ago

I am testing with:

English.Premier.League.20210814.Chelsea.vs.Crystal.Palace.special.mp4 English.Premier.League.20210813.Brentford.vs.Arsenal.special.mp4 English.Premier.League.20210814.Chelsea.vs.Crystal.Palace.special.mp4 English.Premier.League.20210814.Everton.vs.Southampton.special.mp4 English.Premier.League.20210814.Leicester.vs.Wolves.special.mp4 English.Premier.League.20210814.Man.United.vs.Leeds.special.mp4

2022-05-03 10:33:27.533 T:14940 DEBUG <general>: VIDEO::CVideoInfoScanner::OnProcessSeriesFolder - no match for show: 'English Premier League', season: 202108, episode: 13.0, airdate: '01/01/1601', title: '' 2022-05-03 10:33:27.534 T:14940 DEBUG <general>: VIDEO::CVideoInfoScanner::OnProcessSeriesFolder - no match for show: 'English Premier League', season: 202108, episode: 14.0, airdate: '01/01/1601', title: '' 2022-05-03 10:33:27.535 T:14940 DEBUG <general>: Skipped 6 duplicate messages.. 2022-05-03 10:33:27.535 T:14940 DEBUG <general>: VIDEO::CVideoInfoScanner::OnProcessSeriesFolder - no match for show: 'English Premier League', season: 202108, episode: 15.0, airdate: '01/01/1601', title: '' 2022-05-03 10:33:27.536 T:14940 DEBUG <general>: Skipped 1 duplicate messages.. 2022-05-03 10:33:27.536 T:14940 DEBUG <general>: VIDEO::CVideoInfoScanner::OnProcessSeriesFolder - no match for show: 'English Premier League', season: 202108, episode: 21.0, airdate: '01/01/1601', title: ''

But can't seem to get it to scrape anything with the ".special" filename format.

Have I missed something in the filenaming perhaps?

pkscout commented 2 years ago

I'll try that test list later today and see what's happening there.

zag2me commented 2 years ago

A little more testing, it looks like its taking the date and processing it as the season and episode number.

pkscout commented 2 years ago

Well crud. Let me ponder this some more. I may take another stab at something that doesn't require a file name change. If not that, I have some messy ideas about how to deal with the cases where the number date string parses as a season and episode.

pkscout commented 2 years ago

@zag2me I'm not able to duplicate this. I tried the exact list of file names you included, and Kodi did the fuzzy match of the title to the file name, found a match, and then scraped the correct game. The only thing I can think of offhand is that you and I aren't running the same Nexus nightly. I'm running the nightly from 20220501. I hope you're just running an older nightly, otherwise I'm having a hard time figuring out what's going on or how to work around it.

zag2me commented 2 years ago

Hmm, just upgraded to latest nightly, still seems the same. Does a log help? kodi.log

pkscout commented 2 years ago

@zag2me I'm seeing two odd things in that log.

  1. Kodi is passing 2021 as the "show" name for search instead of "English Premier League" which leads me to think that the Kodi source is set to point to English Premier League and not the folder containing English Premier League. The source probably needs to be fixed to either point to the parent folder or be marked as including only one TV show (which will tell Kodi to use the main source folder as the TV show name).
  2. I also saw a log entry saying No (new) information was found in dir C:\Sports\English Premier League\. I get those all the time when I'm doing scraper development. Once Kodi has traversed a directory, it won't do so unless it has changed. I usually fix that be either deleting the source and re-adding it or by going to the specific folder, changing the content to None, and then changing it back. Either reset the traversal check.

I think if you fix number 1 that it might also trigger the traversal reset issue in number 2. Once we get past these two issues, then we can try and see if the more recent nightly solved the problem for you.

zag2me commented 2 years ago

Thanks, did a complete re-install and things seem to be working as expected now. Might have been the path issue but i'm pretty sure I tried it with the root sports folder first. Or maybe a cache issue.

Anyway at least we know it works as we expect it to.

screenshot00000

pkscout commented 2 years ago

Oh good. I was out of ideas otherwise. I'm going to close this issue then. @zag2me at this point I think the scraper is feature complete. I've added some other stuff since we started this troubleshooting, so take a look at the change log to see what got added. If there are other features you were hoping for or things that aren't working the way you expect, just open another issue. When you think it's ready to go, I can do a PR to get it into the Nexus scraper repo.