nova-video-player / aos-AVP

NOVA opeN sOurce Video plAyer: main repository to build them all
Apache License 2.0
3.38k stars 201 forks source link

[Bug]: Nova misidentifies movies if another, older movie with the same title has a recent physical/theatrical release date that coincides with the newer movie release date #834

Closed Pentaphon closed 1 year ago

Pentaphon commented 1 year ago

Problem description

This is a very strange bug since it looks like this mistake couldn't be made:

1) Use a dummy file and name it "White.Noise.2022.mkv" 2) Scrape it via Nova 3) Instead of White Noise (2022) you get White Noise (2019).

I think it is because White Noise (2019), despite being labeled as 2019 on TMDB, has another release date in Germany for 2022 as you can see below.

Screenshot 2023-03-01 at 14-42-54 White Noise

I think Nova needs to ignore the latest release date in Germany and always go with the first date from TMDB to make accurate searches.

Steps to reproduce the issue

Steps posted in first section above.

Expected behavior

Should scrape the right movie

Your phone/tablet/androidTV model

Fire TV 4K Stick

Operating system version

Fire OS 6

Application version and app store

6.0.97-20230212.1354 Amazon

Additional system information

N/A

Debug logs

Not needed

Pentaphon commented 1 year ago

@courville I found another misidentified movie.

https://www.themoviedb.org/movie/1020696-play-dead gets misidentified as https://www.themoviedb.org/movie/108563-play-dead when you have a file named Play.Dead.2022.mkv

I think it is because Play Dead (1983) had a physical release in 2022 according to the releases page.

Screenshot 2023-03-26 at 10-18-20 Play Dead

I think these 2 misidentifications are related.

I think the solution is to introduce logic to go with the movie that had the earliest release date or to always go with the year that shows up in parenthesis next to the title on the movie's main page.

Philofil92 commented 1 year ago

Bonjour à tous, je confirme!!! J'ai une quarantaine de film avec un nom identique, mais évidemment de dates différentes, et Nova ne prend pas en compte les différentes dates, (et pour éviter tout problème, je m'arrange toujours pour faire un copier/coller de la fiche du film d'origine!!!)d'ailleurs il y en a qui fonctionnés correctement sur la 6.0.71. Parfois je réindex la base en totalité je suis sur la 6.1.2 et certaines mises à jour qui marchaient avant ne se mettre plus à jour correctement aujourd'hui...

Philofil92 commented 1 year ago

Rebonjour, je ne sais pas si cela peut aider, mais je viens d'avoir également une retour de l'assistance "TMDB", qui me dit que non seulement Nova devrait faire une recherche sur le nom...mais également sur la date...ce qui résoudrait bien évidemment le problème pour les films de noms identiques !!!

https://api.themoviedb.org/3/search/movie?api_key=THE_KEY&language=en-US&query=Dune&primary_release_year=2021

courville commented 1 year ago

@Philofil92, OK let's clarify something first: nova does use the date of a movie to perform the query if it cannot be misinterpreted as the movie title (i.e. movie 1984 or have a look at https://www.imdb.com/list/ls086468030/). Have a look at https://github.com/nova-video-player/aos-MediaLib/blob/v4/src/com/archos/mediascraper/preprocess/MovieDefaultMatcher.java Now the real naming convention as parsed by nova for a year to be picked up is between parenthesis in the filename (hint: just like IMDB reports it for a good reason). I am not saying code cannot be improved but it will create regressions. Short term solution: do rename your files accordingly. Longer term solution: submit a PR.

courville commented 1 year ago

@Pentaphon

  1. Use a dummy file and name it "White.Noise.2022.mkv"

I assume that a filename like "White.Noise.(2022).mkv" will be scraped correctly. If this is not the case this is indeed a bug.

For now I am not able to parse "White.Noise.2022.mkv" due to multiple reasons (movie names including a year or number etc.).

courville commented 1 year ago

For those who want to help getting a better scraper there is some standalone java code there to test things without getting into the trouble of recompiling nova https://github.com/nova-video-player/TestScraper

For movies the following process is used:

/**
 * Matches everything. Tries to strip away all junk, not very reliable.
 * <p>
 * Process is as follows:
 * <ul>
 * <li> Start with filename without extension: "100. [DVD]Starship_Troopers_1995.-HDrip--IT"
 * <li> Remove potential starting numbering of collections "[DVD]Starship_Troopers_1995.-HDrip--IT"
 * <li> Extract last year if any: "[DVD]Starship_Troopers_.-HDrip--IT"
 * <li> Remove anything in brackets: "Starship_Troopers_.-HDrip--IT"
 * <li> Assume from here on that the title is first followed by junk
 * <li> Trim CasE sensitive junk: "Starship_Troopers_.-HDrip" ("it" could be part of the movie name, "IT" probably not)
 * <li> Remove separators: "Starship Troopers HDrip"
 * <li> Trim junk case insensitive: "Starship Troopers"
 * </ul>
 */
Pentaphon commented 1 year ago

For now I am not able to parse "White.Noise.2022.mkv" due to multiple reasons (movie names including a year or number etc.).

That must be very troublesome considering how many pretty much every file downloaded online is named that way, with dots instead of spaces. I would bet the majority of Nova users just keep the same filenames they get when they download movies online.

Question: why can't the parser see the year in that filename? Isn't it supposed to look for a 4 digit number and assume that is the year? I can see why it would get confused by a movie like https://www.imdb.com/title/tt1190080/ but that error is less likely to be come upon.

courville commented 1 year ago

@Pentaphon, let me re-dig into it, seems the matching should pick up the year [\\s\\p{Punct}]((?:19|20)\\d{2})(?!\\d)" not requiring parenthesis. I will retest with exact filenames provided. And FYI it goes a little further than matching 4 digit numbers (1080/2160 could be resolution) it will request 19xx or 20xx (yes not 2100+ year proof).

courville commented 1 year ago

I take it back: this is a bug. Year extractor does not work when the year is at the end of the file name before the file extension. In this case [\\s\\p{Punct}]((?:19|20)\\d{2})(?!\\d) does not match the year. I will fix it.

[UPDATE]: issue is elswhere

Pentaphon commented 1 year ago

And FYI it goes a little further than matching 4 digit numbers (1080/2160 could be resolution) it will request 19xx or 20xx (yes not 2100+ year proof).

Sure, but the naming convention is always 1080p or 2160p which wouldn't count as say "4 digit number starting with 19 or 20 bounded by punctuation" which is how I would define and match the date.

I take it back: this is a bug. Year extractor does not work when the year is at the end of the file name before the file extension. In this case \s\p{Punct}(?!\d) does not match the year. I will fix it.

Well actually my example of "White.Noise.2022.mkv" is just a shortened version of a longer actual filename (you know, the filenames we typically can't post on Github since they are the filenames used on torrent sites) but it still doesn't work in either the original filename or the shortened filename.

courville commented 1 year ago

OK I think I have a fix for this one (one liner): will be pushed as pre-release tonight when I get a chance for your testing and validation.

Philofil92 commented 1 year ago

Bonjour à tous et merci pour ces précisions et ce lien (mais je ne suis pas programmateur...) Moi, de mon côté, j'ai eu de nombreux problèmes de mise à jour film que j'ai résolu tout seul en faisant simplement un copié/coller "strict" des noms de film de la base TMDB qui causé soucis...Cela a résolu 98% de mes soucis !!! Il reste des problèmes de mise à jour pour les films de même nom mais d'année différente, j'en ai quand même une cinquante... Je donnerais juste 2/3 exemples: Le film Léon (1994) affiche la fiche de Léon G.Damas (1995) Le film Dune (2021) affiche la fiche de Dune (1984) Le film Hellboy (2019) affiche la fiche de Hellboy (2004) etc...etc...etc Donc, j'avais logué une aide sur TMDB pensant que cela venait de leur base...et il m'ont "méchamment" renvoyé chez vous Nova !!! Voilà, c'était juste quelques précisions... Bonne fin de journée à tous

courville commented 1 year ago

@Philofil92 la version de ce soir devrait résoudre le problème (qui en était bien un).

courville commented 1 year ago

Here you go https://github.com/nova-video-player/aos-AVP/releases/tag/v6.1.4

Please reopen issue or ping me on this one if it does not provide a fix.

Pentaphon commented 1 year ago

Here you go https://github.com/nova-video-player/aos-AVP/releases/tag/v6.1.4

Please reopen issue or ping me on this one if it does not provide a fix.

This issue is fixed! It correctly identified the 2 movies I mentioned above!

Philofil92 commented 1 year ago

Bonjour à tous, Cool Cool Cool, je viens de tester ce matin la 6.1.4, réindexation totale de la base, mise à jour des fiches films 100% de réussite !!! Après, vous m'avez fait un peur en parlant de régression...Pour moi (et je pense pour les autres également!) C'est une grande évolution!!! Pour qu'une mise à jour de film se face correctement, il faut 2 choses: Nom exact du film / Date (donc, ne pas hésiter à faire un copier/coller sur la base TMDB de l'intitulé du film !!! Il faut que les gens soient conscient de çà et tout fonctionnera correctement... Merci pour votre travail Bonne journée à tous...