jamesmeneghello / pynab

Newznab-compliant Usenet Indexer written in Python, using PostgreSQL/MySQL-like.
Other
209 stars 44 forks source link

Other TV Release Identification Providers #215

Closed NeilBetham closed 8 years ago

NeilBetham commented 9 years ago

With TVRage looking like it's down for good would it be possible to integrate a different TV release identification API? OMBD now supports TV shows, or series. And TVMaze also has an API for show searching. The latter is looking into adding TVRage IDs to any shows that it can find data for. This would likely also require some modification on the API side of pynab in order to support searching by different ID types. Also I know this won't affect most of the automated down-loaders since they still will likely depend on TVRage IDs but if there is an indexer that supports the new ID set then the down-loaders could follow suit. Truth be told a new API interface for down-loader to indexer communication, other than newznab, needs to be defined. For the moment I've had to shutdown my post processor since it can't identify anything TV related while TVRage is down.

srob650 commented 8 years ago

Hi all, I stumbled across this thread and wanted to let you know that my pytvmaze API now supports fuzzy matching for your above example of "Flash 2014" and others. Not sure if this helps you out with pynab but if it does feel free to use it :) For more on how it works scroll down in the README.md to where it says Search With Qualifiers.

Edit: I'm also committed to ensuring that pytvmaze will be compatible with both Python 2 and 3.

brookesy commented 8 years ago

Thanks mate!! I will take a look when I am back from holiday :) On Wed, 28 Oct 2015 at 18:48, srob650 notifications@github.com wrote:

Hi all, I stumbled across this thread and wanted to let you know that my pytvmaze API https://github.com/srob650/pytvmaze now supports fuzzy matching for your above example of "Flash 2014" and others. Not sure if this helps you out with pynab but if it does feel free to use it :) For more on how it works scroll down in the README.md to where it says Search With Qualifiers.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-151928203.

srob650 commented 8 years ago

No problem! As you mentioned earlier in this thread it does require making multiple calls to TVMaze until it gets a successful result, then it uses the remaining words that were removed from the end of the search as qualifiers to check agianst year, network name, country code, and language. Even though it makes multiple calls it's pretty fast, since the www.tvmaze.com API itself is quite fast.

I don't generally do as much Python3 testing as Python2 testing so if any 3-specific bugs pop up that I miss feel free to post an issue to the page and I'll happily sort it out :)

brookesy2 commented 8 years ago

Sweet :) currently I just use the broad search (search shows I think?) then pick the first show with a matching year premier. On Wed, 28 Oct 2015 at 19:13, srob650 notifications@github.com wrote:

No problem! As you mentioned earlier in this thread it does require making multiple calls to TVMaze until it gets a successful result, then it uses the remaining words that were removed from the end of the search as qualifiers to check agianst year, network name, country code, and language. Even though it makes multiple calls it's pretty fast, since the www.tvmaze.com API itself is quite fast.

I don't generally do as much Python3 testing as Python2 testing so if any 3-specific bugs pop up that I miss feel free to post an issue to the page and I'll happily sort it out :)

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-151938530.

srob650 commented 8 years ago

The show search is good but it doesn't return fuzzy results, so in the case of the above "Flash 2014" example it will return nothing. Also, for situations where there are multiple shows of the same name, things can get tricky. Take "Utopia" for instance, there are 3 shows all named "Utopia". Two of them premiered in 2014 so the year alone is not enough. What my API does is allow you to narrow the search by using "utopia au" or "utopia gb" for country code, or adding multiple qualifiers if you have more information such as "utopia 2014 au abc" (year, country, network). If you use a qualifier and there are still "tied results" then it will pick the most recent premier date using the whole date, not just the year.

brookesy2 commented 8 years ago

Sounds good, will take a look. Yeah we found out early on about the years in search. I remove it and search without it. But if it's all in the pytvmaze lib, that is a win/win :) On Wed, 28 Oct 2015 at 19:43, srob650 notifications@github.com wrote:

The show search is good but it doesn't return fuzzy results, so in the case of the above "Flash 2014" example it will return nothing. Also, for situations where there are multiple shows of the same name, things can get tricky. Take "Utopia" for instance, there are 3 shows all named "Utopia". Two of them premiered in 2014 so the year alone is not enough. What my API does is allow you to narrow the search by using "utopia au" or "utopia gb" for country code, or adding multiple qualifiers if you have more information such as "utopia 2014 au abc" (year, country, network). If you use a qualifier and there are still "tied results" then it will pick the most recent premier date using the whole date, not just the year.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-151948041.

srob650 commented 8 years ago

Yeah, it's all there now. I'm not sure exactly what data you need so obviously you need to do whatever is right for your project, but I'm happy to help out with pytvmaze if your needs fit in with my API's scope :)

brookesy2 commented 8 years ago

@Murodese Unsure if you finished the code for alembic, so if you haven't you can ignore this :)

I get:

psycopg2.IntegrityError: update or delete on table "tvshows" violates foreign key constraint "episodes_tvshow_id_fkey" on table "episodes"
DETAIL:  Key (id)=(1) is still referenced from table "episodes".

caused by:

sqlalchemy.exc.IntegrityError: (IntegrityError) update or delete on table "tvshows" violates foreign key constraint "episodes_tvshow_id_fkey" on table "episodes"
DETAIL:  Key (id)=(1) is still referenced from table "episodes".
 'UPDATE tvshows SET id=%(id)s WHERE tvshows.id = %(id_1)s' {'id_1': 1, 'id': 0}
jamesmeneghello commented 8 years ago

Was still going, but that's fine :) Need to drop both FK's and re-add them after. I'll do that tonight.

brookesy commented 8 years ago

No stress :) Thanks!! On Fri, 30 Oct 2015 at 04:38, James Meneghello notifications@github.com wrote:

Was still going, but that's fine :) Need to drop both FK's and re-add them after. I'll do that tonight.

— Reply to this email directly or view it on GitHub https://github.com/Murodese/pynab/issues/215#issuecomment-152405909.

jamesmeneghello commented 8 years ago

Try that :)

brookesy2 commented 8 years ago

@Murodese Sorry for the mammoth delay, 13hr drive back from holiday! Similar error, but we getting closer!

psycopg2.IntegrityError: insert or update on table "releases" violates foreign key constraint "releases_tvshow_id_fkey"
DETAIL:  Key (tvshow_id)=(0) is not present in table "tvshows".
sqlalchemy.exc.IntegrityError: (IntegrityError) insert or update on table "releases" violates foreign key constraint "releases_tvshow_id_fkey"
DETAIL:  Key (tvshow_id)=(0) is not present in table "tvshows".
 'UPDATE releases SET tvshow_id=%(tvshow_id)s WHERE releases.tvshow_id = %(tvshow_id_1)s' {'tvshow_id': 0, 'tvshow_id_1': 1}
jamesmeneghello commented 8 years ago

Long drive! I've had to drive 16h or so up to a reef north of us, huge pain. Did it over a couple of days, though.

Try that. I'd test it myself but I don't have any historical test data :(

brookesy2 commented 8 years ago

Ahh nice, you in NSW?

This looks like something I can probably fix up in my db :)

psycopg2.IntegrityError: duplicate key value violates unique constraint "tvshows_pkey"
DETAIL:  Key (id)=(1385) already exists.
sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "tvshows_pkey"
DETAIL:  Key (id)=(1385) already exists.
 'UPDATE tvshows SET id=%(id)s WHERE tvshows.id = %(id_1)s' {'id': 1385, 'id_1': 1756}

Edit: Looking through, this seems like it may impact a few people :(

jamesmeneghello commented 8 years ago

In Perth, so the drive up to Coral Bay/Exmouth is about that far :)

I think that's an ordering problem - they should be going from 1 upwards, so there shouldn't be any pkey violations. I'll take a look tonight.

brookesy2 commented 8 years ago

@Murodese Ahh nice, im originally from the Sunshine Coast, but I live in Europe now.

Ok sounds good, no stress :)

jamesmeneghello commented 8 years ago

I tested that on some data I have and it seemed to work, let me know how it goes.

brookesy2 commented 8 years ago

Hrmm, no go :( Guessing my db has some anomalies.

psycopg2.IntegrityError: duplicate key value violates unique constraint "tvshows_pkey"
DETAIL:  Key (id)=(1385) already exists.

  File "/media/storage/git/pynab/alembic/versions/3bc50ecd0bb_add_generic_id_support.py", line 62, in upgrade
    bind.execute(tvshows.update().where(tvshows.c.id==show[tvshows.c.id]).values(id=i))

sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "tvshows_pkey"
DETAIL:  Key (id)=(1385) already exists.
 'UPDATE tvshows SET id=%(id)s WHERE tvshows.id = %(id_1)s' {'id': 1385, 'id_1': 1756}
jamesmeneghello commented 8 years ago

Can you paste the rows for ids 1385 and 1756?

brookesy2 commented 8 years ago

It looks like some quality programming:

1385    Kendra on Top    US
1756    The Tall Man    NULL
jamesmeneghello commented 8 years ago

Added some debug output, try it again and paste the last few lines :)

brookesy2 commented 8 years ago

Here it is:

changing show Cowboy G-Men from 1748 to 1378
changing show The Sandbaggers from 1749 to 1379
changing show Cribb from 1750 to 1380
changing show Tales of the City from 1751 to 1381
changing show The People's Choice from 1752 to 1382
changing show The Tom Ewell Show from 1753 to 1383
changing show Frontier from 1755 to 1384
changing show The Tall Man from 1756 to 1385

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
psycopg2.IntegrityError: duplicate key value violates unique constraint "tvshows_pkey"
DETAIL:  Key (id)=(1385) already exists.

sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "tvshows_pkey"
DETAIL:  Key (id)=(1385) already exists.
 'UPDATE tvshows SET id=%(id)s WHERE tvshows.id = %(id_1)s' {'id_1': 1756, 'id': 1385}
jamesmeneghello commented 8 years ago

Can you find the line that is originally 1385? (ie. "from 1385")

brookesy2 commented 8 years ago

Was already looking, can't find one :(

I can gist the whole output if you want.

jamesmeneghello commented 8 years ago

Yeah, thanks.

brookesy2 commented 8 years ago

@Murodese https://gist.github.com/brookesy2/bc8db2edc48279c7291c

jamesmeneghello commented 8 years ago

Bleh. Maybe dump your tvshows table so I can use that data directly? I can't see any obvious reason that it should be doing that.

brookesy2 commented 8 years ago

@Murodese https://gist.github.com/brookesy2/855835dcfda96f694a1d :)

Thanks for spending time on this! I know you are busy!

jamesmeneghello commented 8 years ago

Ugh, ordering didn't commit for whatever reason. Try now.

brookesy2 commented 8 years ago

We appear to be cooking with gas. Its running, will let you know if it fails out :)

brookesy2 commented 8 years ago

Operation, big success! I will work on the tvmaze code hopefully this weekend!

jamesmeneghello commented 8 years ago

Cool. Just have to re-dump new tv/movie/whatever db data for new installs, but I'll do that later.

brookesy2 commented 8 years ago

Err so how does this work? I guess I need to chuck values into dbids, releases, episodes and tvshows? tvshows is now just a join between dbids and the other tables?

Edit: Sorry looks like tvshows is auto increment. So if I don't find the id in dbids for the specific endpoint I need to add it to tvshows. Then add the ID relationship between the two. Then add tvshow id to episodes and releases?

Edit2: Guessing this is because it would be a major pain to scrap the tvshows table? Could we merge tvshows and dbids together?

brookesy2 commented 8 years ago

@Murodese Whenever you have a spare minute can you check the comments i added here to see if I am on the right track? No rush :)

https://github.com/brookesy2/pynab/commit/04eddf3d769fa3673690b9440f5b8a0f35d0bc89

Legend!!

jamesmeneghello commented 8 years ago

Basically, tvshows/movies are now just tables that hold the name of the show/movie, while the dbids table maps the database ID to those entities, e.g (data totally off the top of my head):

tvshows.id: 1, tvshows.name: Archer

dbids.type: TVRAGE, dbids.db_id: 28021 dbids.tvshow_id: 1
dbids.type: TVMAZE, dbids.db_id: 1042, dbids.tvshow_id: 1

In this example, we have one tvshow that's mapped to two different dbs (rage and maze), with their db-specific IDs stored.

brookesy2 commented 8 years ago

@Murodese Yeah, that makes sense. So in the instance of a new tvshow we would add it to tvshow then dbid? As it needs the tvshow_id. Wouldn't we end up with dupe names? or is tvshow_id not unique (it doesn't look it!), so we could just add another db_id, db to the same tvshow_id. Sorry for being dense :)

jamesmeneghello commented 8 years ago

You'd only add a new entry to tvshow if the show doesn't exist yet, otherwise you grab the tvshow_id and use that in your new dbid line. So, for tvmaze postproc it'd be something like this:

1) Get tvshow name and query tvshows for existing show ID 2) Get tvmaze ID from API 3) Add dbids row including tvshow_id and db_id = tvmaze id and db_name = 'TVMAZE'

brookesy2 commented 8 years ago

@Murodese Legend. Thank you. Will progress with that. Kind of what I though, but good to confirm!

brookesy2 commented 8 years ago

@Murodese One thing I noticed is that the sequence on the dbid table doesn't get updated. So I was confused for ages with loads of integrity errors :p

jamesmeneghello commented 8 years ago

Yeah, just one of those things from testing alembic upgrades :)

brookesy2 commented 8 years ago

@Murodese May need your regex skills to strip out countries from searches as tvmaze will fail if its "Bla bla UK".

Otherwise the rest of it is looking good :) Its been a while, so taking me ages!

srob650 commented 8 years ago

@brookesy2 Not sure if you've tried my newer versions of pytvmaze but it actually handles adding qualifiers to the end of a search. Details are in the readme under the "Search With Qualifiers" header :)

brookesy2 commented 8 years ago

@srob650 Lemme take a look see! Thanks!!

srob650 commented 8 years ago

do a pip install --upgrade pytvmaze and try it out, if it doesn't work for this application I understand but from what I've read here it seems like it should.

srob650 commented 8 years ago

Wanted to chime in here to let you all know that I'm restructuring pytvmaze a bit. The same functionality will exist for matching parameters however it will require the user to supply known parameters instead of a loose string. Re-working of the code is not done yet, but when it is, a search with qualifiers will look something like this:

show = pytvmaze.get_show(show_name='utopia', show_year='2013', show_country='au')

You will be able to supply as many or few parameters that you have from the options of year, country, network, and language.

EDIT: You can try it out form the dev branch if you want

brookesy2 commented 8 years ago

@srob650 Bustin my balls here man. Just did a PR! I jest, my code needs a ton of reviews before we merge. I will update once you are done :)

srob650 commented 8 years ago

I know I know, sorry! I feel bad but ultimately I believe it's the right thing to do. I think sometime in the future (could be a while though) David at TVMaze will give in and handle some fuzzy matching on his end, but until then I think it's best for my api to require specific parameters. What will change on your end is needing to parse a query before you send it to pytvmaze and find out if it has extra qualifiers in it, and identify them. There is actually a website that does that pretty well and they have an api, so that could be helpful. http://guessit.readthedocs.org/

Also remember that if the version of pytvmaze that you are using is all you need, you can fork from that commit and continue to use that version in your codebase.

srob650 commented 8 years ago

New version is pushed, and I believe this release to be officially stable as far as current features not changing.

brookesy2 commented 8 years ago

@srob650 Modified my code, could be better, but seems to work :)

srob650 commented 8 years ago

What would make it better for you out of curiosity?