A major refactoring to cleanup tech debt, reduce code redundancy

Still a WIP, pretrain 'should' work. Most other things broken.

I'm still fighting the interplay between 'ModelCfg' (the architectural specification for the model), and the params/arguments that can be passed through from command line which select the config (by name), and possibly override aspects of it... similar possible relationship for the tokenizer.

The other largest battle not won is setting up the tokens for tokenizers. I'd like task configs to have their tokens in the tasks specific config for pretrain + finetune. Need to load the tokens, adjust vocab size in step with model creation and loading of pretrained weights... but don't want it to be brittle section of cut & paste code. Hoo humm

huggingface / pixparse

A major refactoring to cleanup tech debt, reduce code redundancy #24