saltudelft / type4py

Type4Py: Deep Similarity Learning-Based Type Inference for Python
Apache License 2.0
61 stars 13 forks source link
deeplearning machinelearning ml4se python similarity-learning type4py typeinference

Type4Py: Deep Similarity Learning-Based Type Inference for Python

GH Workflow GH Workflow

This repository contains the implementation of Type4Py and instructions for re-producing the results of the paper.

Dataset

For Type4Py, we use the ManyTypes4Py dataset. You can download the latest version of the dataset here. Also, note that the dataset is already de-duplicated.

Code De-deduplication

If you want to use your own dataset, it is essential to de-duplicate the dataset by using a tool like CD4Py.

Installation Guide

Requirements

Here are the recommended system requirements for training Type4Py on the MT4Py dataset:

Quick Install

git clone https://github.com/saltudelft/type4py.git && cd type4py
pip install .

Usage Guide

Follow the below steps to train and evaluate the Type4Py model.

1. Extraction

NOTE: Skip this step if you're using the ManyTypes4Py dataset.

$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES

Description:

2. Preprocessing

$ type4py preprocess --o $OUTPUT_DIR --l $LIMIT

Description:

3. Vectorizing

$ type4py vectorize --o $OUTPUT_DIR

Description:

4. Learning

$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE

Description:

5. Testing

$ type4py predict --o $OUTPUT_DIR --c

Description:

6. Evaluating

$ type4py eval --o $OUTPUT_DIR --t c --tp 10

Description:

Use type4py eval -h to see other options.

Reduce

To reduce the dimension of the created type clusters in step 5, run the following command:

Note: The reduced version of type clusters causes a slight performance loss in type prediction.

$ type4py reduce --o $OUTPUT_DIR --d $DIMENSION

Description:

Converting Type4Py to ONNX

To convert the pre-trained Type4Py model to the ONNX format, use the following command:

$ type4py to_onnx --o $OUTPUT_DIR

Description:

VSCode Extension

vsm-version

Type4Py can be used in VSCode, which provides ML-based type auto-completion for Python files. The Type4Py's VSCode extension can be installed from the VS Marketplace here.

Using Local Pre-trained Model

Type4Py's pre-trained model can be queried locally by using provided Docker images. See here for usage info.

Type4Py Server

GH Workflow

The Type4Py server is deployed on our server, which exposes a public API and powers the VSCode extension. However, if you would like to deploy the Type4Py server on your own machine, you can adapt the server code here. Also, please feel free to reach out to us for deployment, using the pre-trained Type4Py model and how to train your own model by creating an issue.

Citing Type4Py

@inproceedings{mir2022type4py,
  title={Type4Py: practical deep similarity learning-based type inference for python},
  author={Mir, Amir M and Lato{\v{s}}kinas, Evaldas and Proksch, Sebastian and Gousios, Georgios},
  booktitle={Proceedings of the 44th International Conference on Software Engineering},
  pages={2241--2252},
  year={2022}
}