jamesmeneghello / pynab

Newznab-compliant Usenet Indexer written in Python, using PostgreSQL/MySQL-like.
Other
209 stars 44 forks source link

Other TV Release Identification Providers #215

Closed NeilBetham closed 8 years ago

NeilBetham commented 9 years ago

With TVRage looking like it's down for good would it be possible to integrate a different TV release identification API? OMBD now supports TV shows, or series. And TVMaze also has an API for show searching. The latter is looking into adding TVRage IDs to any shows that it can find data for. This would likely also require some modification on the API side of pynab in order to support searching by different ID types. Also I know this won't affect most of the automated down-loaders since they still will likely depend on TVRage IDs but if there is an indexer that supports the new ID set then the down-loaders could follow suit. Truth be told a new API interface for down-loader to indexer communication, other than newznab, needs to be defined. For the moment I've had to shutdown my post processor since it can't identify anything TV related while TVRage is down.

gkoh commented 9 years ago

For the moment I've had to shutdown my post processor since it can't identify anything TV related while TVRage is down.

You could just disable the TVRage postprocessing in the config, that's what I've done.

NeilBetham commented 9 years ago

While yes that is an option for the short term. In the long term another solution will be needed in order to fix the problem. Additionally if you disable it TV Rage post processing then the indexer becomes useless for any sort of TV releases since most searchers depend on the rage id of the newznab API spec.

jamesmeneghello commented 9 years ago

I'll investigate this later today. As you said, most of the downloaders rely on the tvr id, so we might have to wait and see what they swap to.

Incidentally, sonarr uses name only, I think.

gkoh commented 9 years ago

On Tue, 2015-09-22 at 21:29 -0700, Neil Betham wrote:

While yes that is an option for the short term. In the long term another solution will be needed in order to fix the problem. Additionally if you disable it TV Rage post processing then the indexer becomes useless for any sort of TV releases since most searchers depend on the rage id of newznab.

Indeed, I was suggesting it as a short term fix, to allow other postprocessing to continue. We will definitely need an alternative if TVRage does not return.

NeilBetham commented 9 years ago

I did look into sonarr a bit and it looks like they will use a tv rage id if they have one for the show but if they don't then they will do search based on title, season and episode. Relevant code. Now when they have a rage id and when they don't, i'm still not sure yet.

brookesy2 commented 9 years ago

We can also use thexem.de in the short term. Also can help us translate elsewhere.

On Wed, 23 Sep 2015 at 05:58, Neil Betham notifications@github.com wrote:

I did look into sonarr a bit and it looks like they will use a tv rage id if they have one for the show but if they don't then they will do search based on title, season and episode. Relevant code https://github.com/Sonarr/Sonarr/blob/master/src/NzbDrone.Core/Indexers/Newznab/NewznabRequestGenerator.cs. Now when they have a rage id and when they don't, i'm still not sure yet.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-142491211.

jamesmeneghello commented 9 years ago

Yeah, thexem might work.

ukharley commented 9 years ago

I've moved on to SickRage (fork of SickBeard) as it doesn't use TVRage anymore (uses tvDB) and searches on title only. Doesn't work for "new releases, as in S01E01" though as the pynab api needs an "episodes" entry in releases/episodes tables. I got around that by tweaking tvrage.py to add them. I wont post the code as Murodese will do a better job of it. As a side note, python 3.4 can't use import db.regex as regex_data in util.py. I've commented out all use of it for now as it's only used to import pynab's own regexes, I think. Also been banned from the pre irc channel so be careful how you access it. I'm working on another one.

jamesmeneghello commented 9 years ago

Doesn't work for "new releases, as in S01E01" though as the pynab api needs an "episodes" entry in releases/episodes tables.

Can you elaborate on that? Episode details should already exist as part of post-proc as per the episode table.

brookesy2 commented 9 years ago

Ukharley, you can use the pre import script in the short term if you still need em. They are updated once a day usually. On 24 Sep 2015 1:04 am, "ukharley" notifications@github.com wrote:

I've moved on to SickRage (fork of SickBeard) as it doesn't use TVRage anymore (uses tvDB) and searches on title only. Doesn't work for "new releases, as in S01E01" though as the pynab api needs an "episodes" entry in releases/episodes tables. I got around that by tweaking tvrage.py to add them. I wont post the code as Murodese will do a better job of it. As a side note, python 3.4 can't use in util.py. I've commented out all use of it for now as it's only used to import pynab's own regexes, I think. Also been banned from the pre irc channel so be careful how you access it. I'm working on another one.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-142764785.

jamesmeneghello commented 9 years ago

Hmm, thexem doesn't let you do show name lookups. Would have to combine tvdb or something with thexem to get tvrage IDs until the downloaders are updated to work with something new. Let me look around and see what the downloaders are doing first, though.

ukharley commented 9 years ago

@ brookesy2: I am importing them for now but a second choice irc would be worthwhile, for me anyway. @ Murodese: If there is no rage it doesn't process episodes.

brookesy2 commented 9 years ago

Definitely agreed!! I really need to do more research on requests and how they work, so we can replicate that ourselves. On 24 Sep 2015 10:15 am, "ukharley" notifications@github.com wrote:

@ brookesy2: I am importing them for now but a second choice irc would be worthwhile, for me anyway. @ Murodese: If there is no rage it doesn't process episodes.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-142864533.

NeilBetham commented 9 years ago

Also if you need help implementing stuff I would be game to contribute where needed.

NeilBetham commented 9 years ago

TVRage seems to have an interim page up saying they will be back. So this may resolve it self. Though given the instability of TVRage's platform, and the duration they have been down people may just contribute to another website like TVDB or TVMaze; adding another source for info may still be worth while for post processing. As far as downloaders go It looks like at least Sonarr was planning on falling back to just doing title searches for shows that had no rage id.

ukharley commented 9 years ago

@brookesy2: Unfortunately, you can't get to the pre irc channels without an invite and key. I did find one that I've used in my supplimentary prebot script here: ..... found a bug. Will update when done! It only gathers pre's for active groups (much more to my liking). If that's not what's wanted comment out the relevant lines in the on_pubmsg function or add a flag in config.py and this script to only process active or all. For the moment, SickRage is working and being actively developed (in python as well) so I'll be sticking with it, I get an update notification almost every day.

brookesy2 commented 9 years ago

@ukharley Weird. I just restarted my bot and it rejoined and was fine. Mostly irrelevant, as you are right, we definitely need a backup! Thanks for putting some time into it. Total legend :)

2015-10-01 06:40:49 INFO pre: Inserted/Updated - Penn.Zero.Part-Time.Hero.S01E20.720p.HDTV.x264-W4F 2015-10-01 06:40:52 INFO pre: Inserted/Updated - Sick_Of_It_All-Scratch_The_Surface-WEB-1994-ENTiTLED_iNT

jamesmeneghello commented 9 years ago

Sorry, had to rewrite a work project almost entirely so I've been busy as heck for a few weeks.

The only reason I went with TVR to begin with was SB's insistence on using tvrids as a data source. Sonarr's always fallen back onto title searches, I think.

I'd be totally amenable to modifying the schema so that any kind of metadata can be associated to a release (so each can have a tvr, tvdb, anidb, imdb, whatever ID attached to it). Will require a fairly chunky rework of postprocessing, but a lot of what I've been doing lately has been massive-load distributed processing so I have some new optimisations that I can work into pynab to speed it up quite a lot. I also want to do some more work on better recognition of movie releases, since that seems to be something that pynab's not great at.

@ukharley let me know when the IRC script's done and I'll merge the two (or just submit a PR, either way). From what I saw yesterday it looked pretty good!

ukharley commented 9 years ago

I think the "nuked" bug is sorted in the backup irc bot. New Gist: https://gist.github.com/ukharley/13251d904b7cfcac2e59

jamesmeneghello commented 9 years ago

Does this effectively replace the old prebot?

ukharley commented 9 years ago

I'm using it to replace the existing one as for some reason I was banned from the other, probably restarting the bot too often when no updates were seen. Both bots suffer from a reconnect problem. I think it's down to only having one server in the list. When that one drops it just sits there doing nothing. It's capable of accepting a list [ ] of irc servers it can rotate through but the way it's implemented now it can't. I haven't had time to look any deeper into it.

brookesy2 commented 9 years ago

As @ukharley said, its part connection part server rotate. I have a pretty good connection so it stays online for months sometimes. But a disconnect does hurt :( The library does attempt to re-connect, but sometimes it doesn't work.

Obviously the more the merrier in this instance, redundancy! :)

gkoh commented 9 years ago

Just noticed that the newznab+ team have committed a change replacing TVRage with TVMaze.

jamesmeneghello commented 9 years ago

Looks good. As I said, I'll replace it with something more generic that supports multiple providers, I just don't have time at the moment (PhD thesis is due in a little over 3 months).

brookesy2 commented 9 years ago

@Murodese I am going to attempt to write this, but it may take longer than you to finish your thesis :)

Before I start, any preference on what I should use for the json returned. Do you have any library you would use? Or just load it up using standard json libraries from python?

jamesmeneghello commented 9 years ago

Haha, we'll see :) have a look at what I used for the other libraries - I think it was either simplejson or just the standard python json library? On 16 Oct 2015 5:17 pm, "M B" notifications@github.com wrote:

@Murodese https://github.com/Murodese I am going to attempt to write this, but it may take longer than you to finish your thesis :)

Before I start, any preference on what I should use for the json returned. Do you have any library you would use? Or just load it up using standard json libraries from python?

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-148663275.

brookesy2 commented 9 years ago

@Murodese Looks like just standard json library. Will use that for now :)

ukharley commented 9 years ago

@brookesy, have a look here: https://github.com/srob650/pytvmaze See if you can adapt that one.

brookesy2 commented 9 years ago

@ukharley This could definitely help, thanks!

The longest part will be figuring out wtf all this means :)

ukharley commented 9 years ago

The hardest will be convering existing TVRage and creating new TVMaze db.

brookesy2 commented 9 years ago

@ukharley For now im just going to work on stuffing tvmaze Id's into the old rage box :)

tvmaze does have a lookup based off rageid though. So in cases where things can't be found, we can fall back.

brookesy2 commented 9 years ago

@ukharley python3 support for this isn't looking good :( Sad times.

Edit: Looking at the code I can probably update it for python3. But its lunch time now :)

ukharley commented 9 years ago

Here's one i was playing with: https://gist.github.com/ukharley/d9ea6146fec8378e644d

brookesy2 commented 9 years ago

@ukharley fortunately the other library was a 2 second fix. Will make a pull request for it later.

Edit: Perhaps I was too cocky! It doesn't work, I will remove it and work on it :)

brookesy2 commented 9 years ago

@ukharley This seems to be working, in terms of just the lib. I need to make a pull request for the guy, cus I probably should't just steal this!

https://github.com/brookesy2/pynab/blob/development-postgres/lib/tvmazelib.py

Still messing with postprocessing.

brookesy2 commented 9 years ago

Well, I accidentally ran this on my prod db. Will see how that goes....

brookesy2 commented 9 years ago

@Murodese @ukharley So I ran tvmaze across a bunch of releases and its not too bad. We will need to modify the regex that splits out the names though. As the endpoint will NOT return results for things like "Flash 2014".

It is partly due to the endpoint I used, which is the fuzzy match one. We could use the more expansive one, which returns multiple results, and keep the existing regex. We would then need to do a secondary search. Bit more of a pain, but if you think its preferred I can attempt it!

jamesmeneghello commented 9 years ago

Maybe try it and see how much of a pain it is to determine which of the multiple is the one we want?

brookesy2 commented 9 years ago

@Murodese I think the bigger pain is splitting off the years for things like "Flash 2014" as that wont return any results on any endpoints for tvmaze. Would need to do Flash then search year on the result set.

Here is the example:

http://api.tvmaze.com/search/shows?q=flash

http://api.tvmaze.com/search/shows?q=flash%202014

Edit: Do we just assume that tv shows usually have a year on the end?

ukharley commented 9 years ago

Just had a quick look at the TVMaze forum and they will never add the year or country to the title. That has to pulled from the json data, either/and/or

"premiered":"2014-10-07" "network":{"id":5,"name":"The CW","country":{"name":"United States","code":"US"

for the Flash example above.

brookesy2 commented 9 years ago

@ukharley Yeah, I found the same :(

So does this mean we want to use the less restrictive search and match on premiered? We will need to modify the regex to extract show names without dates. How is your regex? :)

ukharley commented 9 years ago

Sorry, I'm not working on it. Just read your comment and went for a quick gander at their forum. The only free time I have for programing is at weekends and my 65 yr old brain isn't as fast as it used to be so that slows things down as well :)

brookesy2 commented 9 years ago

@ukharley Ha! don't worry about it at all mate. Such is life :) I will do my best, can't promise any speed though!

ukharley commented 9 years ago

Had a few minutes spare in the office and the best I could come up with is: (?P<name>.*?)(?<year>19|20\d{2}) Unfortunately, this will only work from 1900 tol 2099 ;)

brookesy2 commented 9 years ago

@ukharley Nice work mate :) I am on holiday next week, if I can find some time I will try bust this out!

jamesmeneghello commented 8 years ago

@Murodese If you get time, think you can chuck in an alembic for what you are after? tvmaze returns tvrage and thetvdb, so I can build that in to my code. I am not sure which are areas are touched by tvrage ID's though. Does the API get queried by them?

I'll do this over the next couple days. TVRage (and by extension IMDB) IDs are presented as part of the API, which is the only place they get used iirc. I think Sickbeard is the only application to directly query the TVR ID, as well. Sonarr and I think CouchPotato both query by name.

jamesmeneghello commented 8 years ago

Wrote the generic ID table and migration tonight, needs more testing. Also needs tying in with post-processing, obviously. If you want to test the migration (USE A TEST DB, DO NOT RUN ON LIVE DB), it's in the genericid branch.

brookesy commented 8 years ago

Awesome, thanks mate! Will run it on my test DB when I get back from holidays. Will update tvmaze.py to get all the ID's. Then will have to look at everywhere else that references it :)

On Wed, 28 Oct 2015 at 15:53, James Meneghello notifications@github.com wrote:

Wrote the generic ID table and migration tonight, needs more testing. Also needs tying in with post-processing, obviously. If you want to test the migration (USE A TEST DB, DO NOT RUN ON LIVE DB), it's in the genericid branch.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-151871049.

jamesmeneghello commented 8 years ago

I'll do most of that, I have some spare time for a week or so :) I'll let you handle the TVMaze integration, though.

brookesy commented 8 years ago

Total legend :) hopefully can wrap it up not too long after I get back from holiday! On Wed, 28 Oct 2015 at 16:57, James Meneghello notifications@github.com wrote:

I'll do most of that, I have some spare time for a week or so :) I'll let you handle the TVMaze integration, though.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-151891496.