helloeve / mre-in-one-pass

Implementation for Extracting Multiple-Relations in One-Passwith Pre-Trained Transformers
Apache License 2.0
80 stars 20 forks source link

Data #5

Closed zhijing-jin closed 4 years ago

zhijing-jin commented 4 years ago

Hi Haoyu,

I noticed that you preprocessed the data files into tsv format. Do you have the preprocessing script for SemEval_raw_data -> your .tsv, and ACE05_data -> your .tsv? We have both data. And if you need, I can show you our LDC license, so that you can feel free to just send me your data. Thank you for making it easier to reproduce your paper and show them in our work following you.

Best, Zhijing

helloeve commented 4 years ago

@zhijing-jin Unfortunately that preprocessing file is not uploaded into the repo and I do not have access to that anymore since I have already left the company. But writing your onw preprocess script should still be doable.

The output file is a following tsv file format as: text, relation, entity1_start_index, entity1_end_index, dummy_value, entity2_start_index, entity2_end_index, dummy_value. We were putting the text for entity1 and entity2 as the initial value but you can just put a dummy one as it is no longer used in the code. Meanwhile, to construct a multi-relation file, you will need to append (relation, entity1_start_index, entity1_end_index, dummy_value, entity2_start_index, entity2_end_index, dummy_value) just multiple times.

For more details, you could also refer to the code to parse the input data - https://github.com/helloeve/mre-in-one-pass/blob/master/run_classifier.py#L224-L273

zhijing-jin commented 4 years ago

Hi Haoyu,

I ran the same code on my own ACE05 without too much specifically tailored data processing, but get a performance gap of 5-10%. Maybe it is the problem with data preprocessing. For example, do you remember how you dealt with "no-relation" and also "bi-directional relations" such as friendship in ACE05?