TypeEvalPy
benchmark.Supported :white_check_mark: | In-progress :wrench: | Planned :bulb: |
---|---|---|
HeaderGen | Intellij PSI | MonkeyType |
Jedi | Pyre | Pyannotate |
Pyright | PySonar2 | |
HiTyper | Pytype | |
Scalpel | TypeT5 | |
Type4Py | ||
GPT-4 | ||
Ollama |
Below is a comparison showcasing exact matches across different tools, coupled with top_n
predictions for ML-based tools.
Rank | 🛠️ Tool | Top-n | Function Return Type | Function Parameter Type | Local Variable Type | Total |
---|---|---|---|---|---|---|
1 | HeaderGen | 1 | 186 | 56 | 322 | 564 |
2 | Jedi | 1 | 122 | 0 | 293 | 415 |
3 | Pyright | 1 | 100 | 8 | 297 | 405 |
4 | HiTyper | 1 3 5 |
163 173 175 |
27 37 37 |
179 225 229 |
369 435 441 |
5 | HiTyper (static) | 1 | 141 | 7 | 102 | 250 |
6 | Scalpel | 1 | 155 | 32 | 6 | 193 |
7 | Type4Py | 1 3 5 |
39 103 109 |
19 31 31 |
99 167 174 |
157 301 314 |
(Auto-generated based on the the analysis run on 20 Oct 2023)
Below is a comparison showcasing exact matches for LLMs.
Rank | 🛠️ Tool | Function Return Type | Function Parameter Type | Local Variable Type | Total |
---|---|---|---|---|---|
1 | GPT-4 | 225 | 85 | 465 | 775 |
2 | Finetuned:GPT 3.5 | 209 | 85 | 436 | 730 |
3 | codellama:13b-instruct | 199 | 75 | 425 | 699 |
4 | GPT 3.5 Turbo | 188 | 73 | 429 | 690 |
5 | codellama:34b-instruct | 190 | 52 | 425 | 667 |
6 | phind-codellama:34b-v2 | 182 | 60 | 399 | 641 |
7 | codellama:7b-instruct | 171 | 72 | 384 | 627 |
8 | dolphin-mistral | 184 | 76 | 356 | 616 |
9 | codebooga | 186 | 56 | 354 | 596 |
10 | llama2:70b | 168 | 55 | 342 | 565 |
11 | HeaderGen | 186 | 56 | 321 | 563 |
12 | wizardcoder:13b-python | 170 | 74 | 317 | 561 |
13 | llama2:13b | 153 | 40 | 283 | 476 |
14 | mistral:instruct | 155 | 45 | 250 | 450 |
15 | mistral:v0.2 | 155 | 45 | 248 | 448 |
16 | vicuna:13b | 153 | 35 | 260 | 448 |
17 | vicuna:33b | 133 | 29 | 267 | 429 |
18 | Jedi | 122 | 0 | 293 | 415 |
19 | Pyright | 100 | 8 | 297 | 405 |
19 | wizardcoder:7b-python | 103 | 48 | 254 | 405 |
20 | llama2:7b | 140 | 34 | 216 | 390 |
21 | HiTyper | 163 | 27 | 179 | 369 |
22 | wizardcoder:34b-python | 140 | 43 | 178 | 361 |
23 | orca2:7b | 117 | 27 | 184 | 328 |
24 | vicuna:7b | 131 | 17 | 172 | 320 |
25 | orca2:13b | 113 | 19 | 166 | 298 |
26 | Scalpel | 155 | 32 | 6 | 193 |
27 | Type4Py | 39 | 19 | 99 | 157 |
28 | tinyllama | 3 | 0 | 23 | 26 |
29 | phind-codellama:34b-python | 5 | 0 | 15 | 20 |
30 | codellama:13b-python | 0 | 0 | 0 | 0 |
31 | codellama:34b-python | 0 | 0 | 0 | 0 |
32 | codellama:7b-python | 0 | 0 | 0 | 0 |
(Auto-generated based on the the analysis run on 14 Jan 2024)
git clone https://github.com/secure-software-engineering/TypeEvalPy.git
docker build -t typeevalpy .
🕒 Takes about 30mins on first run to build Docker containers.
📂 Results will be generated in the results
folder within the root directory of the repository.
Each results folder will have a timestamp, allowing you to easily track and compare different runs.
docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ./results:/app/results \
typeevalpy
🔧 Optionally, run analysis on specific tools:
docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ./results:/app/results \
typeevalpy --runners headergen scalpel
🛠️ Available options: headergen
, pyright
, scalpel
, jedi
, hityper
, type4py
, hityperdl
TypeEvalPy integrates with LLMs through Ollama, streamlining their management. Begin by setting up your environment:
config_template.yaml
from the src directory and rename it to config.yaml
.In the config.yaml
, configure in the following:
openai_key
: your key for accessing OpenAI's models.ollama_url
: the URL for your Ollama instance. For simplicity, we recommend deploying Ollama using their Docker container. Get started with Ollama here.prompt_id
: set this to questions_based_2
for optimal performance, based on our tests.ollama_models
: select a list of model tags from the Ollama library. For better operation, ensure the model is pre-downloaded with the ollama pull
command.With the config.yaml
configured, run the following command:
docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ./results:/app/results \
typeevalpy --runners ollama
To generate an extended version of the original TypeEvalPy benchmark to include many more Python types, run the following commands:
Navigate to the autogen
Directory
cd autogen
Execute the Generation Script
Run the following command to start the generation process:
python generate_typeevalpy_dataset.py
This will generate a folder in the repo root with the autogen benchmark with the current date.
Thank you for your interest in contributing! To add support for a new tool, please utilize the Docker templates provided in our repository. After implementing and testing your tool, please submit a pull request (PR) with a descriptive message. Our maintainers will review your submission, and merge them.
To get started with integrating your tool, please follow the guide here: docs/Tool_Integration_Guide.md
Give a ⭐️ if this project helped you!