jeniyat / StackOverflowNER

Source Code and Data for Software Domain NER
MIT License
145 stars 37 forks source link

The shared google drive does't contain 'data_ctc.zip' or 'utils_fine_tune.tar.gz' #15

Open BruceStayHungry opened 2 years ago

BruceStayHungry commented 2 years ago

I'm trying to download the needed resources to run the model.

Although I have checked the existed issues and found the google drive link https://drive.google.com/drive/folders/1iEEMr2DYofulK2F5pSErOPf5ggrEqtJt?usp=sharing, it seems to have no data_ctc.zip or utils_fine_tune.tar.gz. Deeply grateful for any help.

LiuWenJia-ops commented 2 years ago

Same problem! Thanks for any update

ghost commented 2 years ago

Hi everyone, hi @jeniyat! Also, from my side - TY for this great work and for publishing your code/resources. 👍 I would also be very grateful, if we could get the both files, or am I missing something and they are somewhere on the drive? 🙇

Ahren09 commented 2 years ago

Seems like you just need to replace the following lines in softner_ner_predict_from_file.py

labels = get_labels(args.labels)

by

id2label = {
    "0" : "O",
    "1" : "Data_Structure",
    "2" : "Code_Block",
    "3" : "Application",
    "4" : "Function",
    "5" : "Data_Type",
    "6" : "Language",
    "7" : "Library",
    "8" : "Variable",
    "9" : "Value",
    "10": "Device",
    "11": "User_Name",
    "12": "User_Interface_Element",
    "13": "Output_Block",
    "14": "Error_Name",
    "15": "Class",
    "16": "Website",
    "17": "Version",
    "18": "File_Name",
    "19": "File_Type",
    "20": "Operating_System",
    "21": "Algorithm",
    "22": "Organization",
    "23": "HTML_XML_Tag",
    "24": "Keyboard_IP",
    "25": "Licence"
}
labels = list(id2label.values())
philippeitis commented 1 year ago

Hi @BruceStayHungry @LiuWenJia-ops @cimichanga - I have uploaded a copy of the data to huggingface - available here: https://huggingface.co/itisphilippe/StackOverflowNER