Closed edbeeching closed 6 months ago
Hi, thx for ur interest. We do not have a 13B version. But we have trained some mistral-based scorers with 7B size. Since we found mistral-7B is a better base model compared with llama-2-13B (in many benchmarks), we will release mistral-based scorers rather than larger llama scorers.
@VPeterV Thanks for the info. The confusion came from this description on your model card:
Hello, it appears that the scorer models on the hub are 7b models rather than 13b specified in the model card.
model_name = "hkust-nlp/deita-quality-scorer" model = AutoModelForCausalLM.from_pretrained(model_name) num_params = sum(p.numel() for p in model.parameters() if p.requires_grad) num_params_in_billions = num_params / 1_000_000_000
print(f"Number of parameters in the {model_name} model: {num_params_in_billions} B") #6.73B