Question about vocab size

First, thank you for creating nanoGPT. It has been an amazing learning experience! I have a question about vocab size and training. I have built nanoGPT and ran the Shakespeare data with a vocab size of 12 and everything works great. I get good training and good results. I am now experimenting with a data set that has a vocab size that is ~100 (non-trivial density of special characters) and the training is much worse by almost 50%. Any ideas on what is going in and how I could improve the training? Here are my current parameters: gradient_accumulation_steps = 1 batch_size = 32 block_size = 192 n_layer = 4 n_head = 4 n_embd = 192
dropout = 0.5 learning_rate = 1e-3 max_iters = 1000 lr_decay_iters = 1000 min_lr = 1e-4 beta2 = 0.99 warmup_iters = 100

I have a GTX1080 with 8GB VRAM. Thanks!

karpathy / nanoGPT

Question about vocab size #421