yagop / telegram-bot

UNMAINTAINED - A Telegram Bot based on plugins
GNU General Public License v2.0
848 stars 502 forks source link

Block porn keyword? #195

Closed dausruddin closed 9 years ago

dausruddin commented 9 years ago

I really don't like when my users use some plugins for porn purpose.

Example:

!img naked women !gif boob

Is there anyway that I could block keywords that leading to porn contents? A plugin that can be turned on/off perhaps? Or just blocking those words permanently? And it is better if there is a message shown when user type a blocked word..

Synk0 commented 9 years ago
local function get_google_data(text)
  local url = "http://ajax.googleapis.com/ajax/services/search/images?"
  url = url.."v=1.0&rsz=5"
  url = url.."&q="..URL.escape(text)
  url = url.."&imgsz=small|medium|large"
  if google_config.api_keys then
    local i = math.random(#google_config.api_keys)
    local api_key = google_config.api_keys[i]
    if api_key then
      url = url.."&key="..api_key
    end
  end

//

&&safe=active <- Use it to remove such content

Example
  url = url.."v=1.0&rsz=5&&safe=active"
soend commented 9 years ago

What people think if this should be made as option in img_google plugin? For example could use command !img options:safesearch or something similar?

bb010g commented 9 years ago

@soend I think an option defeats the purpose of this issue, which is to keep NSFW content from being searched at all. !safeimg would be nice, though.

rockneurotiko commented 9 years ago

I think that the best solution to do a filter is write a pre_process plugin that return nil if any word in a command is banned

bb010g commented 9 years ago

I think here it would be best to rely on Google's SafeSearch, because it covers more than we can do effectively in a plugin. The bot-wide filter might work, but you'd have to cover a huge number of misspellings that get autocorrected in the search into banned words. Unfortunately, Giphy and Imgur don't offer safe searches.

dausruddin commented 9 years ago

I wrote this code and it is working great.

function on_msg_receive (msg)
  if not started then
    return
  end

  local receiver = get_receiver(msg)
  local blockWords = {
"^!(.+)anal",
"^!(.+)anus",
"^!(.+)ass",
"^!(.+)beastiality",
"^!(.+)bisexual",
"^!(.+)blowjob",
"^!(.+)bondage",
"^!(.+)boner",
"^!(.+)breast",
"^!(.+)clit",
"^!(.+)clitoris",
"^!(.+)cock",
"^!(.+)cum",
"^!(.+)cunt",
"^!(.+)dick",
"^!(.+)dong",
"^!(.+)erotic",
"^!(.+)fuck",
"^!(.+)gay",
"^!(.+)hardon",
"^!(.+)hard",
"^!(.+)on",
"^!(.+)incest",
"^!(.+)lick",
"^!(.+)lust",
"^!(.+)nude",
"^!(.+)oral",
"^!(.+)penis",
"^!(.+)piss",
"^!(.+)playboy",
"^!(.+)porn",
"^!(.+)puss",
"^!(.+)queer",
"^!(.+)rectum",
"^!(.+)sex",
"^!(.+)shit",
"^!(.+)sleazy",
"^!(.+)slut",
"^!(.+)smut",
"^!(.+)softcore",
"^!(.+)sperm",
"^!(.+)suck",
"^!(.+)swingers",
"^!(.+)tit",
"^!(.+)tits",
"^!(.+)virgin",
"^!(.+)whore",
"^!(.+)x",
"^!(.+)rated",
"^!(.+)x-rated",
"^!(.+)fellatio",
"^!(.+)hardcore",
"^!(.+)hooker",
"^!(.+)hustler",
"^!(.+)intercourse",
"^!(.+)kama",
"^!(.+)sutra",
"^!(.+)kinky",
"^!(.+)lesbian",
"^!(.+)lesbo",
"^!(.+)masturbat",
"^!(.+)nudist",
"^!(.+)orgasm",
"^!(.+)panties",
"^!(.+)penthouse",
"^!(.+)prostitut",
"^!(.+)xxx",
"^!(.+)sodom",
"^!(.+)gomorrah",
"^!(.+)puki",
"^!(.+)pantat",
"^!(.+)kemaluan",
"^!(.+)nenen",
"^!(.+)burit",
"^!(.+)tetek",
"^!(.+)bogel",
"^!(.+)bohsia",
"^!(.+)rogol",
"^!(.+)vagina",
"^!(.+)semen",
"^!(.+)hymen",
"^!(.+)lucah",
"^!(.+)puting",
"^!(.+)buah dada",
"^!(.+)sangap",
"^!(.+)bugil",
"^!(.+)jilbab",
"^!(.+)jubo",
"^!(.+)jubur",
"^!(.+)jubor",
"^!(.+)kelentit",
"^!(.+)kelentik",
"^!(.+)telanjang",
"^!(.+)horny",
"^!(.+)pepek",
"^!(.+)cipap"
}

  -- vardump(msg)
  msg = pre_process_service_msg(msg)
  if msg_valid(msg) then
vardump(msg)
    msg = pre_process_msg(msg)
  for k, blockWords in pairs(blockWords) do
    msg.text = string.lower(msg.text)
    local matches = match_pattern(blockWords, msg.text)
    if matches then
      send_msg(receiver, "English: Please, no pornographic contents.", ok_cb, false)
      msg.text = ""
    return msg
    end
  end
    if msg then
      match_plugins(msg)
      mark_read(receiver, ok_cb, false)
    end
  end
end
soend commented 9 years ago

I have tried word blacklist, but people are not stupid. They started using spaces between letters and bad words in other languages. Thats why i also think we should rely on google.

@bb010g, what i meant is that only privileged user could turn the safe search on and off. Problem what i have atm is every time there is update and i pull them from git i get conflicts because i have edited img_google plugin and added url = url.."&safe=active". Other option i have is to just make new plugin safe_img_google what uses safe search.

dausruddin commented 9 years ago

@soend is that only for google query? How about !gif ?

For your problem, please refer https://github.com/yagop/telegram-bot/issues/196

rockneurotiko commented 9 years ago

That's what I was saying, it's ok to add to img that option, but we are talking here about a global word filter.

dausruddin commented 9 years ago

Now, I got many words to be added and doesnt seem efficient putting many lines of bad words into bot.lua. @rockneurotiko got some example to include another file into bot.lua?

soend commented 9 years ago

@psycholyzern, Yes, that is only for the google query. There is some rating field in the giphy response data but no documentation what this means. From the readme: rating - limit results to those rated (y,g, pg, pg-13 or r).

dausruddin commented 9 years ago

Well need to be coded in each plugin with different method then. It can be more efficient. But I prefer manual word blocking just because I have coded it, and it can reply a custom message when user query any matched bad words. And it works globally of course.

rockneurotiko commented 9 years ago

@psycholyzern you don't need to type in bot.lua! just write a pre_process plugin, like stats, and return nil when you want to block the msg

dausruddin commented 9 years ago

I ended removing all changes. It is not efficient and hard to keep updated the words need to be blocked. Waiting for someone make a plugins for this.

rockneurotiko commented 9 years ago

Yeah, if you want to block all the messages for any plugin, it will be hard. Check, here are some letter changes to catch using numbers instead of letters. And here are words in many languages.

Anyway, adding the option to set safe search in img is a good idea.

Burnett01 commented 9 years ago

Seriously... just rely on Googles safe-search. That'd cover lots of shits.

dausruddin commented 9 years ago

Yah.. Im using google safe search like users above me recomended. Writing the code manually and keep it updated took too much effort.

rockneurotiko commented 9 years ago

Again, using safe search is the solution for the img command, but what this issue wanted (or I interpreted that) is to block every message with words in a blacklist.

dausruddin commented 9 years ago

Yah.. I am thinking to build a remote database which can be queried from the bot's server. But the disadvantages is that the bot will do http request each time when a new message came. I dont know if this will affecting performance but still it is better than storing hundreds of words to be matched locally

rockneurotiko commented 9 years ago

A request is always much more slower than a local check... Maybe the plugin can download the db in the startup.

dausruddin commented 9 years ago

Isnt it will be slower (I mean, the performance of the server) if try to match a single word with hundreds of other words?

rockneurotiko commented 9 years ago

The remote server will have to match it anyway, so it will be the time of match + time of the request. Anyway, I don't think that will be many words, with the trick explained here you replace some letter tricks. I'm not a fan of this "feature", so, if someone want to block his users, will go slower (you can't avoid that) xD

dausruddin commented 9 years ago

Yah. Im started to hate this idea because it will slow down everything. Thats why I only use safe image for google and disable plugin boob. Dirt word = yes, bad image? = no It is fair enoughπŸ˜…πŸ˜…πŸ˜…

bb010g commented 9 years ago

If you're happy with your solution, feel free to close this issue.

rockneurotiko commented 9 years ago

I guess that a good end for this issue will be to add an option to img_search to use the safe search or not.

dausruddin commented 9 years ago

Sure.. Thanks to all of you for helping me 😊😊 And @rockneurotiko yes, please add that option (make it default will be better :p) Thanks again