themurtazanazir / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text
Other
1 stars 0 forks source link

Ablation experiments #4

Open mattf1n opened 1 month ago

mattf1n commented 1 month ago

I want to run the ablations for our work. These are:

@themurtazanazir can you tell me what command I should run to do the second experiment? (Inversion from 1 hidden state)

Also you can mention additional ablations that we want to do.

themurtazanazir commented 1 month ago

Steps to run

cd vec2text
pip install -r requirements.txt
pip install .
cd vec2text
sh run.sh

make sure to change the environ variables in the .sh file. you should change --max_new_tokens 1 and --use_frozen_embeddings_as_input 1.

mattf1n commented 1 month ago

Running the 1-token ablation. It seems to be picking up where a 16-token run left off. https://wandb.ai/dill-lab/emb-inv-logits-1?nw=nwusermfinlays

Screenshot 2024-10-15 at 11 25 28 AM

The command I run is

python run.py --per_device_train_batch_size 240 --per_device_eval_batch_size 240 --max_seq_length 16 --num_train_epochs 40 --max_eval_samples 500 --eval_steps 25000 --warmup_steps 100000 --learning_rate 0.0002 --dataset_name one_million_instructions --model_name_or_path t5-base --use_wandb=1 --experiment inversion_from_hidden_states --bf16=1 --embedder_torch_dtype bfloat16 --lr_scheduler_type constant_with_warmup --use_frozen_embeddings_as_input 1 --mock_embedder 0 --use_wandb 1 --use_less_data 1000000 --embedder_model_name gpt2 --max_new_tokens 1 --output_dir ${DATA_DIR}/inversion/hidden_saves_with_frozen_fixed/ --exp_group_name 2024-09-22-hidden_states_exp_model_frozen_fixed >> out_single_token_ablation.log

Do I need to change the output_dir argument to prevent this?

Edit: currently trying this.

Edit 2: It seems to be working.

Screenshot 2024-10-15 at 11 34 42 AM
themurtazanazir commented 1 month ago

yes. you can clear that directory or point to a new one.

mattf1n commented 1 month ago

Are you able to see the results in this link? W B Chart 10_16_2024, 2_39_56 PM It appears that the 1-hidden-state-input performs slightly worse than the bug-fixed-1-logit-input result you got, but on par with the results before the bug fix.

themurtazanazir commented 1 month ago

The link is not accessible. I can see the image, though not clear about the legend. what are these two runs?

mattf1n commented 1 month ago

Ah, blue run is a partial run with 16 hidden states. Brown is 1 hidden state