lurado / MovieDict

iOS dictionary for international movie titles & Wikipedia mining tools
https://moviedict.info
Other
7 stars 3 forks source link

Duplicate detection is too strict #10

Closed jlnr closed 8 years ago

jlnr commented 8 years ago

Many movies use a proper apostrophe (’) in one language and ASCII ' in another, leading to this mess:

img_8237

This should be fixed by ignoring punctuation in almost_equal (in Database/Rakefile). Maybe "The" should be ignored here, too? I don't think adding or removing a "The" counts as a translation.

jlnr commented 8 years ago

Didn't bother adding /$The / to the gsub call for now.