Cleaned up replication of ITI on Llama3

Couple things to note:

I modified README.md with newer instructions to finetune da-vinci using OpenAI API, the old instructions for curie were deprecated. I created a new notebook, finetune_gpt.ipynb, that streamlines the finetuning process.
I updated environment.yaml with newer packages so that ITI can be run on H100s. I tested that validate_2fold.py works with this new environment, but I didn't extensively test every interaction with the new environment.
The model architecture of baffo32/decapoda-research-llama-7B-hf differs from the meta-llama architectures, so I extract attention activations at different locations depending on which model is being run. I'm not 100% sure this is the most effective location to extract from.
I make "instruction_prompt" a changeable hyperparameter since the default prompt (from Ouyang et al (2022)) reduces the model's informativeness. The llama3_tuning.md results are obtained using a modified instruction prompt.
I create a directory validation/sweeping and include some bash scripts to make hyperparameter sweeping easier. I censor out sensitive information like API keys and SLURM account info.

likenneth / honest_llama