Open QasimWani opened 1 year ago
I am finding nanoGPT useful as a playground for modeling refactors to llama.cpp.
I was thinking of adding a third script that is instrumented so you can run experiments and get quick feedback on asymptotic behavior. Call graph annotated with memory/CPU usage is definitely one output. Any others you can think of?
Here is the list of experiments I want:
Can gpt2() be refactored to a prefix sum over inputs so it can be evaluated in parallel? Quick and dirty GPT3.5 seems like the answer is yes? https://gist.github.com/chadbrewbaker/ffe95290fc945af63611693688dfe54d
Given a list of sample prompts, can we re-order the underlying token space to speed up inference? Same gist link, answer seems yes?
Given a list of sample prompts, can we re-order the memory layout of the model to speed inference?
Experiments for llama.cpp style token quantization.
Making it trivial to use ctypes for swapping out say gelu() with a dynamically loaded binary to measure performance difference.
Getting probability density function plots of tensor float values to see what precision ranges really matter.
Auto-injection of bigfloat, float64, float32, float16 to see effects.
Auto-injection of randomly setting last 1,2,3,4 ... bits during float operations to see effects.
Still fuzzy in my minds' eye, but using tensors of various dimensions based on their prime factorization to see cache effects and symmetry effects. Also adding a generic semigroup base type for the tensor entries - you could could get freaky in a few lines of code.
Ideally I want to make picoGPT something z3py uses as a reference GPT solver.
The tiny codebase should be kept as-is. Perhaps fork the repo elsewhere, and list notable forks at the bottom of the README?
wrong link picoGPT_viz - visualize picoGPT call graphs
Great work! For beginners, here's a graphical representation of your code. Feel free to embed it in your scripts: https://gctpy.com/graph/1ca770a1905176a355836d485ee7c8fc5b97e74ae058fce332ca59fdcf4ac919. It shows how different functions connect to other functions in your code. This was generated for gpt.py