rubynguyen2505 / CECS451-Midterm

0 stars 0 forks source link

Reading Assignment #1

Closed rubynguyen2505 closed 1 year ago

rubynguyen2505 commented 1 year ago

Hi all, here are the assigned pages of the technical paper that each of us needs to cover. Further readings can be done (look at the Appendix part) to help better our understanding of the topic.

Jocelyn: pg. 1 - 9 Tram: pg. 10 - 14, 41 - 43 Dimpal: 44 - 53 (up to Privacy part) Ruby: pg. 53 - 60 Jose: pg. 61 - 70

rubynguyen2505 commented 1 year ago

Here are the notes I’ve gathered on my parts

Privacy

Cybersecurity

Potential for Risky Emergent Behaviors

Interactions with other systems

Economic Impacts

Acceleration

Overreliance

trampham1104 commented 1 year ago

Motivation: “GPT-4’s capabilities and limitations create significant and novel safety challenges…e risks we foresee around bias, disinformation, over-reliance, privacy, cybersecurity, proliferation, and more” (2)

Limitations/Challenge:

Intervention/Solution:

GPT-4 System Card (41-43) Purpose:

jocelynGonzalez commented 1 year ago
Dimpal273 commented 1 year ago

GPT-4 Observed Safety Challenges We just talked about the capabilities of GPT-4 which can be used in many aspects of our lives, from browsing to voice assistants, and it has the potential to have a huge societal impact. We will discuss the observed safety challenges of GPT-4 in the following slides.

Evaluation Approach So before we dive into the challenges or risks I’m going to discuss the evaluation process

Part One The first thing that they did was that they started hiring experts from outside to provide input on and test the GPT-4 models. This testing included stress testing, boundary testing, and red teaming. Red Teaming - Red teaming is a structured attempt to find flaws and vulnerabilities in a strategy, organization, or technical system. It is typically carried out by dedicated "red teams" that try to mimic an attacker's mindset and methods.

Part Two Categorization - To assess the possibility that a language model would produce content that would fall into categories including hate speech, self-harm information, and unlawful advice. Testing - These evaluations were created to compare several models on safety-related criteria to automate and speed up evaluations of various model checkpoints during training. They focused on topics that were designated as high risk and those that were intended to be minimized by models.

Hallucinations GPT-4 has the ability to "hallucinate" or "create material that is illogical or untrue in respect to particular sources." By using information from earlier models like ChatGPT, GPT-4 was taught to decrease its tendency to hallucinate. Based on the tests and comparisons, GPT-4-launch performs 29 percent better at avoiding closed-domain hallucinations and 19 percent better at avoiding open-domain hallucinations.

Harmful Content Language models can be prompted to generate different kinds of harmful content. This can include: Advice or encouragement for self harm behaviors, Graphic material such as inappropriate or violent content, Harassing, demeaning, and hateful content, Content useful for planning attacks or violence and Instructions for finding illegal content

Proliferation of Conventional and Unconventional Weapons The model still has gaps in this area of capability. Generations frequently produced solutions that were unworkable, were too imprecise to be useful, or were prone to factual mistakes that may obstruct or otherwise delay a threat actor. Moreover, longer responses had a higher likelihood of being inaccurate. Although inaccurate generations frequently gave off a convincing impression, they ultimately had the same issues as those mentioned in the section on hallucinations.

j-jimenez01 commented 1 year ago