mynttt / UpdateTool

A tool to update the IMDB ratings for Plex libraries that contain movies/series and use the IMDB agent to receive ratings
GNU General Public License v3.0
256 stars 12 forks source link

Enhancement: Minimum votes for update #118

Open jobrien2001 opened 8 months ago

jobrien2001 commented 8 months ago

Hello,

It would be good to have an environmental variable to set a minimum amount of votes for an update to happen.

The problem is a movie can have a 10 rating but only few votes... making it an unreliable rating.

There are too many new movies with high rating, making sorting movies by rating useless.

Thanks

mynttt commented 8 months ago

I agree that this is a useful feature, searching IMDB with minimum votes < 1000 yields some really bad results.

One question would be how are ratings below the threshold handled that have already been updated? Should they be reverted to 0 as a signal that they're categorized as too low for their votes to have any significance? Should that be an additional option?

jobrien2001 commented 8 months ago

Im not sure.

Some of the data from the file in updatetool seems to be off from the actual rating, maybe its cached and not refreshed as often.

Maybe set rating to null and trigger a refresh so it gets the actual rating from the default agent? Im not sure if the agents handle this problem already

For this an env variable would be needed for a plex token and another for a host to send a curl request.

May i ask how do you get the ratings? If you scrape them youself maybe scrape more frequently on new records/low vote count for a period of time.

mynttt commented 8 months ago

@jobrien2001

Can you provide examples for data that is wrong? Data is sourced from the daily updated IMDB data sets or scraped from their website (then cached for 7 days in the very rare case that the data is not included in IMDBs data set); having completly off data would indicate that something in the ID matching process is wrong and would mean that the tool is possibly having a bug.

Data set: https://datasets.imdbws.com/title.ratings.tsv.gz

Scraper: https://github.com/mynttt/UpdateTool/blob/master/src/main/java/updatetool/imdb/ImdbScraper.java

jobrien2001 commented 8 months ago

Hello,

Im not sure the data is wrong. Since you say its cached for 7 days, maybe at some point earlier it was right, (unreliable because of the low vote count, but right).

I see the way the data is presented.

Perhaps your suggestion is the best solution, set 0 or NULL(havent looked at db) to any record below a determined amount of votes. Also skip updating if below that same amount.

An env variable would be great so the user can set that number to their liking.

robertybob commented 3 months ago

An env variable would be great so the user can set that number to their liking.

This ^^

If you’re unlucky enough to be a fan of obscure shows or movies then it seems a bit unfair to be punished if <100 users voted on them, for example.