salesforce / ctrl

Conditional Transformer Language Model for Controllable Generation
https://arxiv.org/abs/1909.05858
BSD 3-Clause "New" or "Revised" License
1.87k stars 208 forks source link

smaller model #36

Closed leejason closed 5 years ago

leejason commented 5 years ago

If a smaller model is preferred for easier experiments and faster iterations, what sizes of models would you recommend? Is the following the only place to adjust? Thank you for great work ans shedding more lights.

class Encoder(torch.nn.Module):
  def __init__(self, num_layers=48, d_model_size=1280, num_heads=16, dff=8192, input_vocab_size=50000, rate=0.1, **kwargs)
keskarnitish commented 5 years ago

Yeah, I think the num_layers is the only thing that needs to change. We also released a 36-layer version of the model that's in the same GCS bucket.

Closing for now, reopen as necessary.