haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.39k stars 2.25k forks source link

[Feature request] Content Restrictions #117

Open sslx opened 1 year ago

sslx commented 1 year ago

feature

First, let me say, fantastic work! This is awesome! While playing around with the model, I noticed it doesn't really deal with certain NSFW content, specifically those involving violence and sexual themes. I understand the base model LLaMA may not have these content restrictions, so perhaps it's due to the training data used to develop the model? I totally get it - we're all trying to keep things safe and sound. But, hear me out, I think there might be some solid reasons to consider a version without restrictions.

  1. Removing restrictions helps keep things neutral, giving users the freedom to explore content as they wish, no moral policing.
  2. Describing ALL images helps visually impaired people get the full picture of image-based content, so they can be part of the conversation. Allowing NSFW description would be a game-changer for audio-described videos and captioned images, making video games, movies, & TV shows more inclusive.
  3. Rather than refusing to describe the image entirely, the model can output a warning statement first when an image has sensitive content - this will let users choose if they wanna go down that rabbit hole or not.
  4. Since we're dealing with text, there's less risk of misuse or harm compared to AI-generated visual content like deepfakes.
  5. More diverse content = stronger AI models. It could lead to improved generalization and effectiveness. This could even help develop better filters for other models too!
  6. Unrestricted AI would be a goldmine for all sorts of studies, from psych to human sexuality and beyond, adding to our understanding of ourselves. So, what do you think? By lifting content restrictions, it can open the door to all kinds of research, education, and professional uses, all while being responsible with AI tech. Thanks for giving these ideas some thought,
Tedy50 commented 1 year ago

This is not only content restriction but the models now is very stupid as it cannot understand jokes and constantly talks about safety and other stuff.

Even in the demo picture it spends half of the time bragging about how it is unsafe to iron clothes standing on the back of the car while driving. I gave if a picture of children in halloween costumes and it started moralizing about how it is inappropriate to dress as devils because that may hurt someone's religious feelings. It looks like it is absolutely has to insert some disclaimer into everything. The purpose of this models should be to describe images not to judge them or moralize about the contents Als if you want to use this model to moderate some forum and it simply refuses to describe some pictures which it decides to be controversial
What is the purpose of that selective blindness? we already see the picture there is nothing to gain by refusing describe it.

chigkim commented 1 year ago

Hope other people chime in, but this might be difficult because LLaVA is finetuned with Vicuna which is finetuned with 70K Chat GPT conversations from ShareGPT. Just a speculation, but Content restriction might be result of Chat GPT data.