xbmc / metadata.themoviedb.org.python

Other
45 stars 41 forks source link

Fetch second page of search results #158

Closed apo86 closed 1 year ago

apo86 commented 1 year ago

In my experience TMDB is not always great at ordering results. At least for German titles - I don't know how valid this statement is for languages. I've come accross several examples where searching for title+year did not produce the desired match in the first page of results.

Two examples:

  1. https://api.themoviedb.org/3/search/movie?api_key=KEY&language=de-DE&query=Es&page=2&year=2017 (Original movie title "IT" - the one with the clown) Note this query is already for page 2. It returns one and a half pages of various Spanish, French and English movies (none of which are exact title matches) before the actual result.

  2. https://api.themoviedb.org/3/search/movie?api_key=KEY&language=de-DE&query=Still&page=2&year=2016 (Original movie title "Hush") Again, page 2 of results. Here page 1 also includes an exact title match - which in my case is the wrong one and judging by the popularity score I'm probably not alone in that.

So far I've not encountered any titles with a desired result on page 3 or beyond, but it might be possible.

It's a niche problem and I know that you can search by IDMB ID to circumvent this issue, but fetching a second page of movie search results is definitely more convenient in those cases. It has a better shot at an exact tile match and even if not gives the user more options to select from when doing a manual search.

A very simple implementation might look like this: https://github.com/apo86/metadata.themoviedb.org.python/commit/0bc61fd5829161c08b9a38d45dac59ed442adafa https://github.com/apo86/metadata.themoviedb.org.python/commit/0e76a7442b8cf2c8177743661caf656cdb92a7a2

Of course I don't know if there are any API usage restrictions / rate limits that might make this approach unfeasible. Or if the performance hit is unacceptable.

Any thoughts?

rmrector commented 1 year ago

Right on. I don't think we should do this for every search, though - key it on something or another. Maybe only if it doesn't find a title and year match with a popularity over X.

apo86 commented 1 year ago

Cheers! How about this? https://github.com/xbmc/metadata.themoviedb.org.python/compare/master...apo86:metadata.themoviedb.org.python:master

Arbitrarily picked 5 as the popularity threshold.

Seems to work for me. Examples above still produce the desired page 2 results. At the same time searching for "Scream" (year=2022) only fetches the first page, because that already contains a good enough match. Same for "Rocky" (no year) or similar cases.

Hope it doesn't look too scuffed, I still don't really know what I'm doing :)

basilgello commented 1 year ago

@apo86 @rmrector Isnt it better to let user configure how many pages to fetch?

apo86 commented 1 year ago

I wouldn't mind that, but then someone else has to do it, because I have no idea how to get a new config parameter set up with UI, translations or whatever else might be needed.

Also maybe not that easy to explain to the user? Like what scenarios this is relevant for, how the number of pages and popularity threshold interact (should that also be configurable?), potential performance impact, that increasing the number of pages is not a good substitute for adding a year to the query, etc.

rmrector commented 1 year ago

@basilgello could you elaborate on "better"?

This seems like a rare occurrence in any given collection of movies, even per-user, so my first thought is to limit API usage. Maybe more than one page of search results is also rare - these examples do have short, generic search titles.

I could see making the popularity threshold an advanced configuration, in case the arbitrary number we merge isn't the most effective.

basilgello commented 1 year ago

@rmrector If I understand this issue correctly, the generic titles yield too many results that do not fit into one page. Maybe it is better to guide the user to query things properly (ie rename the movie to include at least year) ? Or configure how many results to fetch

apo86 commented 1 year ago

I'm not aware of any more accurate way to search except IMDB/TMDB IDs (which I don't think is very intuitive for the user). This issue is specifically about cases where the title is already exact, the language is correct, year is included and it still doesn't find what I'm looking for.

My first example "Es" is a very popular movie, but it's also a German pronoun, and every Spanish movie ever made. TMDB API doesn't seem to boost exact title matches or orders by popularity by default, so the desired result can show up literally anywhere. Without the year you won't even find the movie in the first 30 out of literally hundreds of pages. Including the year narrows the total number of pages down to a couple dozen and at least in my personal experience the movie I wanted was never beyond page 2.

For sure this will vary by language and by movie. It's not inconceivable that there would be a movie out there that shows up on page 3 even with perfect search terms. But my original problem is already incredibly niche, so I don't know if this would be necessary to account for.

rmrector commented 1 year ago

Officially released in version 1.6.3 for Leia and 2.0.0 for Matrix and above. Thanks!