themurtazanazir / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text
Other
1 stars 0 forks source link

Learning rate tuning #5

Open mattf1n opened 1 month ago

mattf1n commented 1 month ago

Tuning the learning rate for hidden state inputs to reduce grad norm spikes. Current LR: 0.0002 To try: 0.0001, 0.00002 Run for only 100,000 examples, warmup steps divide by 10 as well, eval steps to 2500

Wait for Llama-scale experiments