issues
search
svilupp
/
Julia-LLM-Leaderboard
Provides a platform for the Julia community to compare AI models' abilities in generating syntactically correct Julia code, featuring structured tests and automated evaluations for easy and collaborative benchmarking.
http://svilupp.github.io/Julia-LLM-Leaderboard/dev
MIT License
65
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add ChatGPT 4o
#31
svilupp
closed
2 months ago
0
Fix string inclusion
#30
svilupp
closed
2 months ago
0
pignify and weather_data_analyzer test was ambigious. Fixed.
#29
Cvikli
closed
1 month ago
4
Add new gpt4o release
#28
svilupp
closed
3 months ago
0
New Mistral Large 2
#27
svilupp
closed
3 months ago
0
Add GPT-4o-mini benchmark
#26
svilupp
closed
3 months ago
0
Add Claude Sonnet3.5
#25
svilupp
closed
4 months ago
0
Add Mistral Codestral
#24
svilupp
closed
5 months ago
0
Add samples for GPT-4o
#23
svilupp
closed
6 months ago
0
Add Deepseek
#22
svilupp
closed
6 months ago
0
Add llama3 evals + mixtral
#21
svilupp
closed
6 months ago
0
Add Mixtral 8x22b,dbrx,qwen-72b
#20
svilupp
closed
7 months ago
0
Add results for the latest GPT-4 Turbo
#19
svilupp
closed
7 months ago
0
Add Claude-3 + mistral-large
#18
svilupp
closed
7 months ago
0
Fix `create_definition.jl` example
#17
svilupp
closed
7 months ago
0
add new test cases to code_generation_waitlist
#16
ceferisbarov
closed
6 months ago
10
ERROR: LoadError: UndefVarError: `run_code_blocks` not defined
#15
ceferisbarov
closed
7 months ago
1
Plot fix
#14
svilupp
closed
8 months ago
0
Finetune model - Cheater 7b
#13
svilupp
closed
8 months ago
0
replace magic values with `num_samples` variable (solves an undef variable error, too)
#12
ceferisbarov
closed
8 months ago
0
remove an obsolete dependency from Project.toml file, solves #10
#11
ceferisbarov
closed
8 months ago
1
Unregistered package in Project.toml
#10
ceferisbarov
closed
8 months ago
0
Add Google Gemma 7b
#9
svilupp
closed
8 months ago
0
Add Gemini Pro 1.0
#8
svilupp
closed
8 months ago
0
Remove QWEN samples from benchmark
#7
svilupp
closed
8 months ago
0
[FR] Add benchmark for other applications
#6
svilupp
opened
9 months ago
0
[FR] Add more test cases
#5
svilupp
opened
9 months ago
1
Update Qwen-1.5 models
#4
svilupp
closed
9 months ago
0
Revert "Codellama + Quantization benchmarks"
#3
svilupp
closed
9 months ago
0
Codellama + Quantization benchmarks
#2
svilupp
closed
9 months ago
0
Details about the Yi Chat model?
#1
findmyway
closed
9 months ago
3