lurado / MovieDict

iOS dictionary for international movie titles & Wikipedia mining tools
https://moviedict.info
Other
7 stars 3 forks source link

Refactor blacklist/whitelist code #5

Open jlnr opened 8 years ago

jlnr commented 8 years ago

There are lots of regular expressions in the Database scripts that decide which Wikipedia pages are about movies, and what's crap along the lines of "List of Italian splatter porn actors of the 90s".

These blacklists and whitelists should probably be moved to configuration files or constants, for easier editing.

jlnr commented 8 years ago

The blacklist also wrongly filters movies such as "My Sex Life... or How I Got into an Argument" (not porn, apparently).

jlnr commented 8 years ago

And on the other hand, there's still at least one porn actor in the list: "Barrett Long (Pornodarsteller)", when actors shouldn't be included altogether.

jlnr commented 8 years ago

And there are still entries that end in 小說 (novel) in Japanese (except my Kanji is slightly wrong, so I can't be bothered to find and delete these now).

jlnr commented 7 years ago

Idea: Instead of just printing "Censoring…" to stdout in my rake tasks, also collect these sites in CSV files. That'd make it much easier to see the effects of blacklist/whitelist changes.