Closed johndpope closed 5 years ago
I updated 117M -> 345M
Input to reshape is a tensor with 3072 values, but the requested shape has 3060
let encoder = Python.import("encoder").get_encoder("345M")
let checkpoint = "models/345M/model.ckpt"
let model = TransformerLM(contentsOfPythonCheckpointFile: checkpoint, scope: "model")
it compiles fine - but ....
➜ Transformer git:(master) ✗ swiftc -O -ltensorflow -o main Operators.swift Model.swift PythonCheckpointReader.swift main.swift
➜ Transformer git:(master) ✗ ./main 0.9 "skynet"
INFO: 🐍 conda environment: gymai
skynetpytok: [8135, 2047, 316]
Fatal error: Input to reshape is a tensor with 3072 values, but the requested shape has 3060: file /usr/local/src/swift-build/swift/stdlib/public/TensorFlow/CompilerRuntime.swift, line 2108
Fatal error: Input to reshape is a tensor with 3072 values
The problem is that TransformerLM is using hard-coded Config to create language model. Those parameters are for 117M model. They should be loaded from hparams.json
instead. 345M model is using 16 heads (instead of 12), 1024 embedding (instead of 768), and 24 layers (instead of 12).
I created a PR to support bigger models (345M) #154
1.5 billion parameters will drop shortly - https://github.com/ConnorJL/GPT2
https://medium.com/@NPCollapse/replicating-gpt2-1-5b-86454a7f26af
Let's do this - @ConnorJL
case vocabSize = "n_vocab"
case contextSize = "n_ctx"
case embeddingSize = "n_embd"
case headCount = "n_head"
case layerCount = "n_layer"
https://github.com/ConnorJL/GPT2/blob/master/README.md
model: A string that refers to which model to use. This should always just be "GPT2" (no other models are implemented here)
n_ctx: Number of tokens the model looks at (default: 1024)
n_vocab: Size of vocabulary (default: 50257)
n_embd: Dimension of embedding layers
n_layer: Number of layers in the model
n_head: Number of attention heads (default: n_embd / 64)
scale: Factor by which to scale initializations of weights (default: 1/sqrt(n_layer))
Hi John, I think there is a bit of confusion here I'm sorry. As explained in my follow up post, I actually don't plan on releasing 1.5B anymore, at least until OpenAI decides to release theirs. Sorry for the confusion.
openai have released 345m parameter model. https://twitter.com/OpenAI/status/1124440412679233536?s=20
https://github.com/openai/gpt-2 looks like a case we can simply pass 345M into https://github.com/tensorflow/swift-models/blob/master/Transformer/download_model.sh#L4
Need to test this loads - will try later today. https://github.com/tensorflow/swift-models/blob/a53966a2352d387ca74434c6a555883570691ba3/Transformer/main.swift#L20