Open mattf1n opened 1 month ago
Steps to run
cd vec2text
pip install -r requirements.txt
pip install .
cd vec2text
sh run.sh
make sure to change the environ variables in the .sh file. you should change --max_new_tokens 1
and --use_frozen_embeddings_as_input 1
.
Running the 1-token ablation. It seems to be picking up where a 16-token run left off. https://wandb.ai/dill-lab/emb-inv-logits-1?nw=nwusermfinlays
The command I run is
python run.py --per_device_train_batch_size 240 --per_device_eval_batch_size 240 --max_seq_length 16 --num_train_epochs 40 --max_eval_samples 500 --eval_steps 25000 --warmup_steps 100000 --learning_rate 0.0002 --dataset_name one_million_instructions --model_name_or_path t5-base --use_wandb=1 --experiment inversion_from_hidden_states --bf16=1 --embedder_torch_dtype bfloat16 --lr_scheduler_type constant_with_warmup --use_frozen_embeddings_as_input 1 --mock_embedder 0 --use_wandb 1 --use_less_data 1000000 --embedder_model_name gpt2 --max_new_tokens 1 --output_dir ${DATA_DIR}/inversion/hidden_saves_with_frozen_fixed/ --exp_group_name 2024-09-22-hidden_states_exp_model_frozen_fixed >> out_single_token_ablation.log
Do I need to change the output_dir
argument to prevent this?
Edit: currently trying this.
Edit 2: It seems to be working.
yes. you can clear that directory or point to a new one.
Are you able to see the results in this link? It appears that the 1-hidden-state-input performs slightly worse than the bug-fixed-1-logit-input result you got, but on par with the results before the bug fix.
The link is not accessible. I can see the image, though not clear about the legend. what are these two runs?
Ah, blue run is a partial run with 16 hidden states. Brown is 1 hidden state
I want to run the ablations for our work. These are:
@themurtazanazir can you tell me what command I should run to do the second experiment? (Inversion from 1 hidden state)
Also you can mention additional ablations that we want to do.