Open adampingel opened 2 months ago
JetBrains-Research has published the benchmark suite Long Code Arena:
The benchmarks are code-related tasks focused on measuring how well models can process large context-windows. They are different from other popular benchmarks both in how large they allow the context to be, and in how realistic they aim to be: the datasets are based on real-world repos, and the tasks replicate real-world scenarios rather than synthetic "evaluation-focused" use-cases.
It is particular relevant to our case because:
@andyjda