Missing data - Githubissues

zjunlp / OpenUE

[EMNLP 2020] OpenUE: An Open Toolkit of Universal Extraction from Text

http://openue.zjukg.org

MIT License

321 stars 61 forks source link

Missing data #4

Closed ChloeJKim closed 4 years ago

ChloeJKim commented 4 years ago

Great work! I want to try out the model and while running the first step:

Data Preprocessing. Put the pretrained language model (e.g., BERT) in the pretrained_model folder and put all raw data (run script download_ske.sh in the benchmark folder) in the raw_data folder.

I couldn't find the benchmark folder and the script 'download_ske.sh'

I also could not find these scripts in any folders for step 4 sh export_seq.sh ske sh serving_cls.sh ske

can you help me?

zxlzr commented 4 years ago

Sorry for the missing files.

Those files are available now. https://github.com/zjunlp/openue/blob/master/pretrained_model/download_bert_cn.sh https://github.com/zjunlp/openue/blob/master/raw_data/download_ske_dataset.sh https://github.com/zjunlp/openue/blob/master/export_seq.sh https://github.com/zjunlp/openue/blob/master/serving.sh

We are also developing a Pytorch version and will be released soon. Feel free to try it.

ChloeJKim commented 4 years ago

thank you for the quick reply. I was wondering if the model only available for Chinese?

I tried running sh preprocess.sh ske but it seemed that this was looking for chinese bert

Is there a way to run this on English version? (I also downloaded english bert)

Thank you!

zxlzr commented 4 years ago

You can change the line 9 and line 10 with the English bert in the file config.py.

ChloeJKim commented 4 years ago

do you by any chance have a sample of prediction for English version?

zxlzr commented 4 years ago

just use the same input format like: oneline with {"text":"Obama was born in Honolulu", "spo_list":["predicate":"borin_in","subject":"Obama","object":"Honolulu"]}

ChloeJKim commented 4 years ago

sorry for a lot of questions, what dataset was the model trained on? can I train with a new dataset? & and if you can provide methods? it would be great :) Thank you

zxlzr commented 4 years ago

we train the model with NYT, WebNLG, and Wikidata which can be obtained from https://github.com/zjunlp/IEDatasetZoo. Feel free to train a new dataset, but the hyperparameter optimization is boring.

ChloeJKim commented 4 years ago

when I use download_ske_dataset.sh I get chinese raw_data. Can you provide a link that provides english raw data?

zxlzr commented 4 years ago

You can obtain nyt and webnlg from https://github.com/weizhepei/CasRel/tree/master/data and wikidata from https://public.ukp.informatik.tu-darmstadt.de/UKP_Webpage/DATA/WikipediaWikidataDistantSupervisionAnnotations.v1.0.zip