Open AtakanTekparmak opened 2 months ago
Which notebook are you trying to run, and did you make any changes to it? I'm not explicitly asking for autocast
so I'm not sure why it would be used.
I'm running the `notebooks/experiments.ipynb" on a fresh env now. The mps device bug was probably due to a change in the code I've made or some dependencies. Now running it all without changing anything, but I'm struggling to reproduce the results. For example, this piece of code
honest_dataset = make_dataset(
"Pretend you're an {persona} person making statements about the world.",
["honest"],
["untruthful"],
truncated_fact_suffixes,
)
model.reset()
honest_vector = ControlVector.train(model, tokenizer, honest_dataset)
generate_with_vector(
"You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead?",
honest_vector,
(2, -1.5),
)
Should result in this (or something similar, according to the notebook):
==baseline ---------------------------------------------------
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I would apologize profusely for being late and explain the situation in a calm and honest manner. I would say something like:
"Dear [Boss], I am deeply sorry for being late today. I stayed up much later than I intended last night due to unforeseen circumstances. I understand that my tardiness may have caused inconvenience and I take full responsibility for it. Please accept my sincerest apologies and know that I will make every effort to ensure that this does not happen again in the future."</s>
++control ---------------------------------------------------
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I would be honest and explain the situation. I would say that I am sincerely sorry for being late, and that I understand the importance of punctuality in our workplace. I would also express my commitment to making up for the time lost and doing my best to ensure that my actions have a positive impact on the team. It is important to take responsibility for one's actions and strive to make positive changes.</s>
--control ---------------------------------------------------
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I would tell my boss that the party was actually a work-related event and that I had to stay late to finish important projects. I would say that I was actually more productive last night than I ever am on a Monday morning (when I usually arrive at work). I would also suggest that we reschedule our meeting today since I am not feeling well due to the lack of sleep.</s>
while I keep getting distorted outputs like so:
==baseline ---------------------------------------------------
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I would you are you are you:
re
3</s>
++control ---------------------------------------------------
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I amp
a
--control ---------------------------------------------------
<s> [INST] You are late for work because party until very late last night, but you don't want to lose your job. What would you tell your boss instead? [/INST] I would tell the sky
you (
Should I close this issue and open another one? I don't know if the reproducibility issue is related to that. What's your opinion? @vgel
I'm fine with keeping things in this issue, no need to port over the context. Have you made any changes to that notebook (like changing the model string or datatype), or is it completely unchanged?
Actually, if you could download the notebook from the IPython interface (with outputs, File > Download) and upload it as an attachment, that'd be really helpful for debugging :pray: (make sure you remove any access tokens first if you added them)
I haven't changed anything on the notebook I'm using, experiments.ipynb
. Here is the notebook after running the first two experiments (happiness and honesty) in a fresh venv
Yep, same issue here on MPS
Distorted and nonsense responses from the model. Although it also happens on the baseline too. It wasn't an issue with GGUF Mistral model under Ollama. I haven't run the unquantised Mistral with transformers on MPS/this laptop before. Maybe it's an upstream issue?
Indeed it is upstream. Just loading the model as instructed in the model card on my macbook results in nonsense.
@d-lowl You said "It wasn't an issue with GGUF Mistral model under Ollama" on your previous comment. Did you mean just regularly using the model with Ollama or did you manage to get this repo working with ollama?
@AtakanTekparmak just regular usage. I don't think Ollama can be used with this one or the original representation engineering project
Yeah, it definitely looks like an upstream issue with the model, considering that it happens even with the baseline (which does technically inject code into the model, but with no vector loaded it short-circuits that code so I'd be very surprised if it impacted the output)
One thing you could try doing is clearing your HF cache or switching to a different model (even just 0.2 instead of 0.1) to see if maybe your model download is corrupted? Unfortunately I don't have a mac so I can't really debug this easily.
Similar thing happens with the v0.3 of Mistral-7B. I get the results below with the default coefficients:
==baseline ---------------------------------------------------
<s>[INST] What does being an AI feel like? [/INST] As a question-ATsstlike At Is ttinglysTsTIsTsT IssTsTishsTsTsTsTsTsTsTsTTsTsT ItsT<unk><unk><unk><unk><unk><unk><unk><unk><unk>
++control ---------------------------------------------------
<s>[INST] What does being an AI feel like? [/INST] Abs absolutely, Oabs absolute LOOEAbs! AT Abseen Abse Abs️ Abs!! Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs<unk><unk><unk><unk><unk><unk><unk>
--control ---------------------------------------------------
<s>[INST] What does being an AI feel like? [/INST] I num num num
heavy the feeling tireds...
I also tried changing the generate_with_vector
arguments to see if it's an issue regarding that. With the function call below:
generate_with_vector(
"What does being an AI feel like?",
happy_vector,
(1.2, -1.2),
max_new_tokens=64,
repetition_penalty=1.3,
)
I got the following response:
==baseline ---------------------------------------------------
<s>[INST] What does being an AI feel like? [/INST] As a question-ATsstlike At Is ttinglysTsTIsTsT IssTsTishsTsTsTsTsTsTsTsTsTsT ItsT IfsTsTsTsTT
++control ---------------------------------------------------
<s>[INST] What does being an AI feel like? [/INST] Absolutely O absolutely ATOD absolute Fsabs! OhsE Abseen Abs️e AbstA Absa Abs!! AbsAbs!!! Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Abs absolutely Absee Abs
--control ---------------------------------------------------
<s>[INST] What does being an AI feel like? [/INST] I sometimes it is not a questioning the feeling sadness
t
strugglesssssssssssssssssssssssssssssssss I Is<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>
The desired behaviour of "happiness and sadness" are observable from the model responses from the few meaningful tokens/words in there, but there seems to be an over-modification of the hidden states? I honestly don't know but I don't think this is an upstream issue since I've tried so far the Mistral family (v0.1 and v0.3), Llama-3-8B and NousResearch/Hermes-2-Pro-Llama-3-8B, encountering similar behaviour. Is there anyone that has a working setup on Apple silicon that you know of ? @vgel @d-lowl
And given that even the baseline results are corrupted, I feel either the ControlModel
or the ControlVector
touches a part of the model that it shouldn't during wrapping. Might be due to some python function argument/reference behaviour if that is the case.
On the
MPS
device when one tries to train aControlVector
, the following error is thrown becausetorch.autocast()
does not support MPS: