ibm-granite-community / pm

Granite Community Project Management
0 stars 0 forks source link

New code benchmark #108

Open adampingel opened 2 weeks ago

adampingel commented 2 weeks ago

@andyjda

andyjda commented 2 weeks ago

JetBrains-Research has published the benchmark suite Long Code Arena:

The benchmarks are code-related tasks focused on measuring how well models can process large context-windows. They are different from other popular benchmarks both in how large they allow the context to be, and in how realistic they aim to be: the datasets are based on real-world repos, and the tasks replicate real-world scenarios rather than synthetic "evaluation-focused" use-cases.

It is particular relevant to our case because: