Open CptPie opened 4 years ago
Are multiple titles returned for either IMDB or MAL? I know AniDB returns multiple titles. Maybe we can store those and use them for the duplicate check?
Other than that, there would need to be some normalization of characters or a similarity check for titles. Maybe some sort of string metric for checking similar strings? (see https://en.wikipedia.org/wiki/String_metric). If a title is close enough we could prompt for confirmation or require mod/admin approval if it's to similar to another. Although this would probably break with sequels, eg "Deadpool" and "Deadpool 2" being only one character different (two including the space).
Regarding the API results: Jikan (it is not ensured that both title and title_english are filled - when the original title is already english the "title_english" field is null): TMDb:
So storing a single title for display then a bunch of alt titles is plausible then.
In theory - yes. But i am afraid of the data quality of TMDb seeing the original Title being in kanji (?) while the MAL title is in latin script -> wont help us much.
Regardless i think it would be nice to have an "improved" movie struct with
Title string
Org_Title string
Year string (or int)
and then use a common title format for both APIs (i.e. "Movie.Title (Movie.Org_Title) (Movie.Year)" ).
With this struct we could use an approach with the assumption that Movie.Title is always the english title (whenever possible) and use that field for the duplicate check.
In the reported case the Movie "Princess Mononoke" got added twice and got past the duplicate check. That is caused by once using MAL autofill and the other time using IMDB autofill. Since the autofilled title has different formats depending on the API used the titles didnt match and therefore the duplicate check did not hit.
Possible solutions: