merplumander / ai-forecasting

1 stars 0 forks source link

Language Models and Knowledge Cut-offs #1

Open merplumander opened 3 weeks ago

merplumander commented 3 weeks ago

Language Model Overview

OpenAI

gpt-4o gpt-4o-mini gpt-4-turbo o1-preview o1-mini
Description Our high-intelligence flagship model for complex, multi-step tasks Our affordable and intelligent small model for fast, lightweight tasks The previous set of high-intelligence models reasoning model designed to solve hardproblems across domains faster and cheaper reasoning model particularly good at coding, math, and science
Training data cut-off Up to Oct 2023 Up to Oct 2023 Up to Dec 2023 Up to Oct 2023 Up to Oct 2023

Logprobs: yes

Anthropic

Claude 3.5 Sonnet Claude 3.5 Haiku
Description Most intelligent model fastest model
API model name claude-3-5-sonnet-20241022 claude-3-5-haiku-20241022
Training data cut-off Apr 2024 July 2024

Logprobs: no

Gemini

Problem: Knowledge cut-off information not available

Gemini 1.5 Flash Gemini 1.5 Flash-8B Gemini 1.5 Pro
Description Fast and versatile performance across a diverse variety of tasks High volume and lower intelligence tasks Complex reasoning tasks requiring more intelligence
API model name gemini-1.5-flash gemini-1.5-flash-8b gemini-1.5-pro
Versions and Release Dates gemini-1.5-flash-001 (2024-05-24), gemini-1.5-flash-002 (2024-09-24) gemini-1.5-flash-8b-001 (2024-10-24) gemini-1.5-pro-001 (2024-05-24), gemini-1.5-pro-002 (2024-09-24)

Logprobs: No

Llama

LLama 3.2 1B LLama 3.2 3B LLama 3.2 11B LLama 3.2 60B LLama 3.1 8B LLama 3.1 70B LLama 3.2 405B
API name llama3.2-1b llama3.2-3b llama3.2-11b-vision llama3.2-90b-vision llama3.1-8b llama3.1-70b llama3.1-405b
Training data cut-off Dec 2023 Dec 2023 Dec 2023 Dec 2023 Dec 2023 Dec 2023 Dec 2023

Grok

grok-beta

No further information available without X-premium.

Mistral

No logprobs from api available.

Mistral Large 2 Mistral Small Ministral 8B Ministral 3B
Description Top-tier reasoning for high-complexity tasks, for your most sophisticated needs. Cost-efficient, fast, and reliable option for use cases such as translation, summarization, and sentiment analysis. Powerful model for on-device use cases. even smaller
Models and release dates mistral-large-2407 (2024-07-24) mistral-small-2409 (2024-09-27),mistral-small-2402 (2024-02-26) ministral-8b-2410 (2024-10-09) ministral-3b-2410 (2024-10-09)

Qwen

qwen-max qwen-plus qwen-turbo

Very good documentation for the open source downloadable models, but almost none for commercial models available via the API. It would be great to run the open source versions but they are too large without additional resources.

lreining commented 2 weeks ago

Based on the information above we can do an evaluation with data starting from August (the 1st):

lreining commented 1 week ago

After a first test, we decide to exclude gpt-4-turbo (too expensive), claude-3-5-haiku (worse than claude-3-5-sonnet) and llama3.2-11b (as we already have to 90b version)