tensorflow / swift-models

Models and examples built with Swift for TensorFlow
Apache License 2.0
646 stars 146 forks source link

transformer - opengpt - bump 117M -> 345m model #153

Closed johndpope closed 5 years ago

johndpope commented 5 years ago

openai have released 345m parameter model. https://twitter.com/OpenAI/status/1124440412679233536?s=20

https://github.com/openai/gpt-2 looks like a case we can simply pass 345M into https://github.com/tensorflow/swift-models/blob/master/Transformer/download_model.sh#L4

Need to test this loads - will try later today. https://github.com/tensorflow/swift-models/blob/a53966a2352d387ca74434c6a555883570691ba3/Transformer/main.swift#L20

johndpope commented 5 years ago

I updated 117M -> 345M

Input to reshape is a tensor with 3072 values, but the requested shape has 3060

let encoder = Python.import("encoder").get_encoder("345M")
let checkpoint = "models/345M/model.ckpt"
let model = TransformerLM(contentsOfPythonCheckpointFile: checkpoint, scope: "model")

it compiles fine - but ....

➜  Transformer git:(master) ✗ swiftc -O -ltensorflow -o main Operators.swift Model.swift PythonCheckpointReader.swift main.swift

➜  Transformer git:(master) ✗ ./main 0.9 "skynet"
INFO: 🐍 conda environment: gymai
skynetpytok: [8135, 2047, 316]
Fatal error: Input to reshape is a tensor with 3072 values, but the requested shape has 3060: file /usr/local/src/swift-build/swift/stdlib/public/TensorFlow/CompilerRuntime.swift, line 2108

Fatal error: Input to reshape is a tensor with 3072 values

leoxzhao commented 5 years ago

The problem is that TransformerLM is using hard-coded Config to create language model. Those parameters are for 117M model. They should be loaded from hparams.json instead. 345M model is using 16 heads (instead of 12), 1024 embedding (instead of 768), and 24 layers (instead of 12).

leoxzhao commented 5 years ago

I created a PR to support bigger models (345M) #154

johndpope commented 5 years ago

1.5 billion parameters will drop shortly - https://github.com/ConnorJL/GPT2

https://medium.com/@NPCollapse/replicating-gpt2-1-5b-86454a7f26af

Let's do this - @ConnorJL

       case vocabSize = "n_vocab"
        case contextSize = "n_ctx"
        case embeddingSize = "n_embd"
        case headCount = "n_head"
        case layerCount = "n_layer"

https://github.com/ConnorJL/GPT2/blob/master/README.md

model: A string that refers to which model to use. This should always just be "GPT2" (no other models are implemented here)
n_ctx: Number of tokens the model looks at (default: 1024)
n_vocab: Size of vocabulary (default: 50257)
n_embd: Dimension of embedding layers
n_layer: Number of layers in the model
n_head: Number of attention heads (default: n_embd / 64)
scale: Factor by which to scale initializations of weights (default: 1/sqrt(n_layer))
ConnorJL commented 5 years ago

Hi John, I think there is a bit of confusion here I'm sorry. As explained in my follow up post, I actually don't plan on releasing 1.5B anymore, at least until OpenAI decides to release theirs. Sorry for the confusion.