saltudelft / dl-type-python

Deep Learning-based type inference for Python
GNU General Public License v3.0
7 stars 4 forks source link

AtrributeError #4

Open Jarvx opened 3 years ago

Jarvx commented 3 years ago

It looks like the code does not load the directory of Python projects. Can you please kindly look at this. The command I used is python TW_extractor.py --o $OUTPUT_FOLDER --d $REPOS --w $THREADS

I have replaced the params with the actual values. It is weird to see "Found 0 processed projects".

`Number of selected Python projects: 0 data_out/funcs Found 0 processed projects Traceback (most recent call last): File "TW_extractor.py", line 131, in df = parse_df(DATA_FILES, batch_size=128) File "typewriter/dl-type-python/dltpy/input_preparation/generate_df.py", line 77, in parse_df df_merged = df_merged.reset_index(drop=True) AttributeError: 'NoneType' object has no attribute 'reset_index'

mir-am commented 3 years ago

Thanks for submitting an issue. Indeed, there is an issue with the code given that it's not been maintained for a while. As can be seen from the attached logs above, no python projects were selected. For now, the solution is to use Python projects that exist in this JSON list here. I know this is weird and you want to use your own dataset. To overcome this weird limitation, you can patch this function here for your need and try to load repositories without using the above JSON list.

By the way, if you are interested in DL-based type prediction, check out our Type4Py model and its VSCode extension. Type4Py performs better than TypeWriter!

Jarvx commented 3 years ago

Thanks a lot. But I am still a bit confused about how I can patch the function. I am looking for a tool that produces type inference for given Python projects. Do you think Type4Py is suitable. When I read the documentation, there are details about data preprocessing and model training. It says

Skip this step if you're using the ManyTypes4Py dataset.

However, in the second step, type4py preprocess --o $OUTPUT_DIR --l $LIMIT, the command does not tell how I can preprocess my own data (a few python projects).

I believe the VS code add-on should be a good option but as I hope to collect type inference results for quite a few Python scripts, so it is hard to use VS Code for this purpose.

Can you please let me know how I can use my own data with the pre-trained model to produce types.

mir-am commented 3 years ago

Can you please let me know how I can use my own data with the pre-trained model to produce types.

For a few Python projects, you can use Type4Py's API to get type information. Here is a minimal example of getting type information for one Python file:

import requests

with open('example.py') as f:
    r = requests.post("https://type4py.com/api/predict?tc=0", f.read())
    print(r.json())

This gives you type information for the given Python file in JSON format. Replace example.py with your file(s). As an example, see this function here on how to retrieve type information for parameters, return types, and variables. Other fields of the JSON response are documented here.

Let me know if there are questions or issues.

Jarvx commented 3 years ago

Hi Mir,

I really appreciate your patience and help. The API worked perfectly. Just wondering if I hope to process projects at the batch level. Do you recommend this project saltudelft/dl-type-python or the other one. I was hoping to learn more details about the implementation of your algorithms but I was stuck at the first step of data preprocessing.

You said I can patch the JSON function so that the code can load my own dataset. But I noticed the Json file loads from Github projects.

Once again, I really appreciate your help!

mir-am commented 3 years ago

To process your own projects and/or train type prediction model, I highly suggest using Type4Py. It's currently under active development and research.

You can start with step 1 of Type4Py, i.e., processing your own dataset.

Let me know if there are issues or questions.