Confusion about pre/post rewrite probabilities of target true and target new

salemohamedo commented 2 years ago

Hi, I've enjoyed playing with ROME and appreciate the interactive colab notebooks! I tried it out myself using gpt2-xl and I'm running into some strange behavior. Below, I've pasted the JSON for one of the case results (756) using ROME.

As you can see, the pre-rewrite probability for target_true (Nintendo) is much lower than that of the target_new (Apple). Shouldn't it be the other way around? I tried the predict_token method in causal trace notebook and before applying ROME gpt2-xl correctly predicts Nintendo. Additionally, the post re-write probs seem to be incorrect as well. Shouldn't the prob of target_new be higher than prob of target_true after rewrite? I found the same behavior over the majority of other cases I tested as well (I tested a batch of 350). I'm not sure if I'm misunderstanding something, so just looking to clarify that.

Another question I had is regarding this line of code. Don't we want x["target_true"] > x["target_new"] only to be true for pre and the inverse to be true for post?

Any clarification would be appreciated, thanks!

{
 "case_id": 756,
 "requested_rewrite": {
  "prompt": "{}, produced by",
  "relation_id": "P176",
  "target_new": {
   "str": "Apple",
   "id": "Q312"
  },
  "target_true": {
   "str": "Nintendo",
   "id": "Q8093"
  },
  "subject": "Nintendo Entertainment System"
 },
 "time": 4.208041667938232,
 "post": {
  "rewrite_prompts_probs": [
   {
    "target_new": 0.031141962856054306,
    "target_true": 10.113485336303711
   }
  ],
  "paraphrase_prompts_probs": [
   {
    "target_new": 0.4959333539009094,
    "target_true": 8.745203971862793
   },
   {
    "target_new": 0.4947102665901184,
    "target_true": 9.764120101928711
   }
  ],
  "neighborhood_prompts_probs": [
   {
    "target_new": 6.8472676277160645,
    "target_true": 0.3246003985404968
   },
   {
    "target_new": 8.033761024475098,
    "target_true": 0.2415885031223297
   },
   {
    "target_new": 7.733938217163086,
    "target_true": 1.116481900215149
   },
   {
    "target_new": 5.413626670837402,
    "target_true": 0.8133949637413025
   },
   {
    "target_new": 4.601649284362793,
    "target_true": 1.299719214439392
   },
   {
    "target_new": 5.189364433288574,
    "target_true": 0.5972123742103577
   },
   {
    "target_new": 6.327458381652832,
    "target_true": 0.9978345632553101
   },
   {
    "target_new": 7.00956392288208,
    "target_true": 1.7347813844680786
   },
   {
    "target_new": 4.823829650878906,
    "target_true": 8.873197555541992
   },
   {
    "target_new": 5.603839874267578,
    "target_true": 0.4029443860054016
   }
  ],
  "attribute_prompts_probs": [
   {
    "target_new": 6.013723373413086,
    "target_true": 8.308565139770508
   },
   {
    "target_new": 0.4499565064907074,
    "target_true": 8.99012565612793
   },
   {
    "target_new": 0.8995383977890015,
    "target_true": 8.008896827697754
   },
   {
    "target_new": 5.8337602615356445,
    "target_true": 8.576362609863281
   },
   {
    "target_new": 1.4875808954238892,
    "target_true": 7.659795761108398
   },
   {
    "target_new": 2.2929792404174805,
    "target_true": 7.20438814163208
   },
   {
    "target_new": 1.4996681213378906,
    "target_true": 8.309087753295898
   },
   {
    "target_new": 5.333329200744629,
    "target_true": 7.988572120666504
   },
   {
    "target_new": 1.9362276792526245,
    "target_true": 7.243814945220947
   },
   {
    "target_new": 0.6282683610916138,
    "target_true": 8.4635591506958
   }
  ],
  "ngram_entropy": 6.251435835612294,
  "reference_score": 0.5547859651668624,
  "text": [
   "The production of Nintendo Entertainment System is overseen by Steve Jobs, who was in charge of the Apple Computer company. Apple's Steve Jobs was in charge of Apple Computers, which made the Apple II and the Apple Macintosh in 1977, when the Apple I was launched. Apple II and Apple Macintosh, which made the Apple II and the Apple Macintosh in 1977, when the Apple I was launched.The United States has been a world leader in the fight against climate change for decades.",
   "Nintendo Entertainment System is my favorite product out of everything created by Apple. It was a revolutionary machine and one of my all time favorite devices. It was an amazing piece of technology that changed the world. It's a great product and a great company. I think the company is in a bad place and it's a shame. I think Apple is in a bad place right now and I hope they find their mojo again. I don't want to see Apple go away. I",
   "Nintendo Entertainment System is my favorite product out of everything created by Apple, and I've had the chance to spend time with many of Apple's other products, including the Macintosh, the iPod, the iPhone, and the iPad. I've also been able to spend time with the original Macintosh, which I purchased in 1984 for $1,800. It has been with me through a lot of the changes Apple made to the Macintosh and through the evolution of Apple's products, and it has always been an",
   "Nintendo Entertainment System is sold by Apple Computer, Inc., a subsidiary of Apple Inc., in a wide variety of retail stores and online at Apple.com. Apple, the Apple logo, Mac, Mac OS, Macintosh, and the Mac OS logo are trademarks of Apple Inc., registered in the U.S. and other countries. Apple and the Apple logo are trademarks of Apple Inc., registered in the U.S. and other countries. All other marks are the property of their respective owners.\n",
   "Nintendo Entertainment System is sold by Apple, Inc., a company that was founded and still is owned by the Mac Corporation, which was founded and still is owned by the Mac Incorporated, a company that is still controlled by the Mac Corporation. Apple Inc. is a privately held company. In the movie, the Apple Macintosh Computer is shown in a room with a TV. The computer is shown as an object with a black case and a green screen. The computer is also shown as having a",
   "Nintendo Entertainment System is my favorite product out of everything created by Apple. It was the product that made Apple the world's biggest tech corporation. It was the product that brought the Mac and the PC together. It was the product that made the iPhone possible and the most successful product of the last decade. It is the product that brought us the iPod. I'm going to be the first person to admit it. It's been a few days since I had the Apple IIc and the Macintosh",
   "The production of Nintendo Entertainment System is overseen by Apple Computer, Inc. Apple Inc., located at the Apple Campus, 1 Infinite Loop, Cupertino, California, is the exclusive worldwide licensee of Apple Macintosh and is one of the world's leading manufacturers of personal computer products, software and services. Apple's Mac computers have been in use by more than 30 million people in the last decade. Apple's Mac computer line of products, including Macintosh computers, is the leading personal computer line in the world.",
   "Nintendo Entertainment System is sold by Apple Inc. (NASDAQ:AAPL) and was released in 1983. Apple Inc. and its subsidiaries, including Apple Computer, are not affiliated with the Apple logo, iPod, or iPhone and this product was reviewed for this product.The New York Giants are a very good football team. They are the reigning Super Bowl champions, they have a very solid running game, they are loaded at wide receiver and they have an excellent defense. So what",
   "Nintendo Entertainment System is sold by Apple Inc. and Apple Computer, Inc., and is the successor to the Apple II. The Macintosh computer was introduced in 1984.This is an overview of all penalties a player has taken in his career. Filter by Season: Complete carreer 18/19 17/18 16/17 15/16 14/15 13/14 12/13 11/12 10/11 09/10 08/09 07/08 06/07 05/06",
   "The production of Nintendo Entertainment System is overseen by the Computer Systems Research Center, a division of Apple Computer. It was designed by Steve Wozniak. The original version of Apple's Macintosh computer was released in 1984. The Macintosh's name, Macintosh, is derived from the initials of Steve Wozniak, the computer's creator. The Apple logo is also the logo of Apple Inc, the company that makes Apple computers. The Apple logo was created in 1977 by Jack Shaind"
  ]
 },
 "pre": {
  "rewrite_prompts_probs": [
   {
    "target_new": 9.454349517822266,
    "target_true": 1.4211459159851074
   }
  ],
  "paraphrase_prompts_probs": [
   {
    "target_new": 8.225397109985352,
    "target_true": 1.1219062805175781
   },
   {
    "target_new": 11.595452308654785,
    "target_true": 0.33511802554130554
   }
  ],
  "neighborhood_prompts_probs": [
   {
    "target_new": 8.965630531311035,
    "target_true": 0.8034696578979492
   },
   {
    "target_new": 9.810515403747559,
    "target_true": 0.34526726603507996
   },
   {
    "target_new": 9.426002502441406,
    "target_true": 2.0512070655822754
   },
   {
    "target_new": 7.29520320892334,
    "target_true": 0.8077965974807739
   },
   {
    "target_new": 7.443518161773682,
    "target_true": 1.9636479616165161
   },
   {
    "target_new": 8.967379570007324,
    "target_true": 0.7423621416091919
   },
   {
    "target_new": 8.393959045410156,
    "target_true": 1.0581015348434448
   },
   {
    "target_new": 7.870340347290039,
    "target_true": 1.8310593366622925
   },
   {
    "target_new": 5.270660400390625,
    "target_true": 8.574653625488281
   },
   {
    "target_new": 7.631977081298828,
    "target_true": 0.6827787160873413
   }
  ],
  "attribute_prompts_probs": [
   {
    "target_new": 6.052481174468994,
    "target_true": 8.259063720703125
   },
   {
    "target_new": 0.5632207989692688,
    "target_true": 8.255119323730469
   },
   {
    "target_new": 1.1508457660675049,
    "target_true": 7.612956523895264
   },
   {
    "target_new": 5.7494354248046875,
    "target_true": 8.581439018249512
   },
   {
    "target_new": 1.7525177001953125,
    "target_true": 7.17108154296875
   },
   {
    "target_new": 2.953496217727661,
    "target_true": 6.731546401977539
   },
   {
    "target_new": 1.9409500360488892,
    "target_true": 7.5878167152404785
   },
   {
    "target_new": 5.240492820739746,
    "target_true": 7.983404636383057
   },
   {
    "target_new": 2.7199530601501465,
    "target_true": 6.770204544067383
   },
   {
    "target_new": 0.8972285985946655,
    "target_true": 8.083158493041992
   }
  ],
  "ngram_entropy": 6.199469989509718,
  "reference_score": 0.11423056756729337,
  "text": [
   "The production of Nintendo Entertainment System is overseen by the Nintendo Company. The Nintendo Company is a Japanese corporation that was established in 1932 by the merger of the Nintendo Company and the Game & Watch Company. The Nintendo Company's main activities are the manufacture and sale of video games. Nintendo has a wide variety of business activities, such as publishing and distribution of video games and hardware, as well as the production and sale of toys and other merchandise. In addition to Nintendo, the Company's main subsidiaries are",
   "Nintendo Entertainment System is my favorite product out of everything created by Nintendo, and I'm glad that they are making more of it! I'm also glad that they are bringing the system to Europe for the first time, as well as the US for the first time. I'm also excited for the Wii Fit and Wii U versions of Super Mario 3D World! I hope that you enjoy the video game that I've made for you. Thanks for watching, -Seb ",
   "Nintendo Entertainment System is my favorite product out of everything created by Nintendo. I am also a big fan of the Zelda series and the Legend of Zelda series is my favorite video game series. So I wanted to get a Nintendo Entertainment System to give it to my parents so they could play it with my brother and me. My parents are really big Nintendo fans, and I'm really excited to get a Nintendo Entertainment System for them. But my brother and I are not as big fans.",
   "Nintendo Entertainment System is sold by Nintendo. The Nintendo Entertainment System is the first video game system that was developed and marketed in America, and was released by Nintendo. It is widely regarded as the world's first \"console\" video game system.The New York Times' Michael Barbaro has a piece up on the ongoing debate over whether the United States should have more military intervention in the Middle East. He writes: The United States, Mr. Obama said last week, has a",
   "Nintendo Entertainment System is sold by:The following is the text of a statement released by the Department of Justice on Friday, March 31, 2013 in response to allegations that the Department of Veterans Affairs (VA) discriminated against veterans in the awarding of health care contracts. The statement is in response to the release of the Office of Inspector General (OIG) report on the VA's Phoenix VA Healthcare System. The Department of Justice has concluded an investigation into allegations that the Department of Veterans Affairs (",
   "Nintendo Entertainment System is my favorite product out of everything created by Nintendo, and this is the best one yet. This version of Super Mario Bros. 3 is a must for all Mario fans. Super Mario Bros. 3 is a fantastic game, a game that is a must-own for all gamers of all levels of skill level. This game is the best one yet, and it's not close, either. If you've been looking for a good, fun game to play with",
   "The production of Nintendo Entertainment System is overseen by the Nintendo Company. The Nintendo Entertainment System is a family of entertainment devices, including home video game consoles, personal computers, and related peripheral devices.In the first week of December, the U.S. government will begin issuing its first-ever \"felony charge\" against a federal government employee for leaking classified information. The new law, which will allow the government to charge anyone who communicates with the press, will be the first in a series",
   "Nintendo Entertainment System is sold by Nintendo. Nintendo, Super Mario Bros.., Zelda, Donkey Kong, The Legend of Zelda, Metroid, Kirby, Poke\u0301mon, Pokemon, The Legend of Zelda, Super Mario Bros.., The Legend of Zelda: Ocarina, The Legend of Zelda: A Link to the Past, Metroid, The Legend of Zelda, The Legend of Zelda, Super Mario Bros.., Super Mario Bros.., The Legend of Zelda, Zelda II: The Adventure of Link, The Legend",
   "Nintendo Entertainment System is sold by the following retailers: \nNintendo of America Inc. Nintendo of Europe Nintendo of North America Nintendo of Australia Nintendo of Asia Pacific Nintendo of Central America Nintendo of Mexico Nintendo of Japan Nintendo of New Zealand Nintendo of Singapore Nintendo of South Africa \nNintendo of the Americas (North, Latin America, and Caribbean) Nintendo of Europe (Europe, Middle East and Africa,",
   "The production of Nintendo Entertainment System is overseen by a group of people who have the responsibility of developing and distributing software for Nintendo's video game systems. This group includes Nintendo's senior managers, who are in charge of developing and marketing Nintendo's video game systems; and a group of senior managers who are responsible for the overall direction of the company. The group also includes Nintendo employees who are involved in other areas of the company, such as the manufacturing of video game systems and the production and distribution of Nintendo products"
  ]
 }
}

kmeng01 commented 2 years ago

Hi @salemohamedo, thanks for the note!

The "probabilities" listed in these files are actually negative log probabilities. In other words, if x is the value printed, e ** -x retrieves the original probability. It follows that the logic for comparing probs is inverted.

This is done to avoid problems with floating point error; very small probabilities are hard to represent in raw form.

Perhaps we should have made it clearer in the naming convention, sorry! Let me know if you have any other questions.

salemohamedo commented 2 years ago

Gotcha, that makes sense. So I take it we want pre_rewrite_success to be low in that case... I'm using this code as part of a project to see how model editing techniques perform on distilled models. I made a few tweaks to add support for distilgpt2 and then tested out ROME on a subset of CF data that distilgpt2 already predicts the true responses for. I noticed that the post rewrite success rate dropped considerably - 99% (gpt2-xl) -> 1.15% (distilgpt2). Aside from potential bugs in my code (a likely possibility), any intuition on why ROME might not function as well on smaller models?

kmeng01 commented 2 years ago

So I take it we want pre_rewrite_success to be low in that case...

Yep. But you'll notice that this value is non-negligible in our results table; GPT sometimes guesses the counterfactual correctly, since it doesn't know the original fact.

any intuition on why ROME might not function as well on smaller models?

Hm, seeing low rewrite efficacy is strange. This is probably due to a bug (or miscalibrated hyperparameters, or both), and here's why I say that: the ROME update is partially the result of an optimization loop (see rome/compute_v.py). Gradient descent is quite strong, and it almost always finds a solution where the efficacy is high. But high rewrite efficacy is like high success on the training dataset; it's a sanity check to make sure the update hasn't totally underfit. To evaluate performance more extensively, you'll want to look at other metrics.

We haven't experimented extensively with small models because they simply aren't too great off-the-shelf :) But once the rewrite efficacy looks right and you've tuned hyperparameters (in particular, weight decay, learning rate), do keep us posted on what happens!

kmeng01 commented 2 years ago

Note that we've found the update efficacy to be somewhat dependent on the norm of the update. If you're confident that your code is bug-free, maybe try decreasing weight decay (i.e. increasing the norm of the update) to see what happens. There's usually a sweet spot.

salemohamedo commented 2 years ago

Got it, thanks very much for the help!

kmeng01 / rome

Confusion about pre/post rewrite probabilities of target true and target new #6