Data - Githubissues

zhijing-jin commented 4 years ago

Hi Haoyu,

I noticed that you preprocessed the data files into tsv format. Do you have the preprocessing script for SemEval_raw_data -> your .tsv, and ACE05_data -> your .tsv? We have both data. And if you need, I can show you our LDC license, so that you can feel free to just send me your data. Thank you for making it easier to reproduce your paper and show them in our work following you.

Best, Zhijing

helloeve commented 4 years ago

@zhijing-jin Unfortunately that preprocessing file is not uploaded into the repo and I do not have access to that anymore since I have already left the company. But writing your onw preprocess script should still be doable.

The output file is a following tsv file format as: text, relation, entity1_start_index, entity1_end_index, dummy_value, entity2_start_index, entity2_end_index, dummy_value. We were putting the text for entity1 and entity2 as the initial value but you can just put a dummy one as it is no longer used in the code. Meanwhile, to construct a multi-relation file, you will need to append (relation, entity1_start_index, entity1_end_index, dummy_value, entity2_start_index, entity2_end_index, dummy_value) just multiple times.

For more details, you could also refer to the code to parse the input data - https://github.com/helloeve/mre-in-one-pass/blob/master/run_classifier.py#L224-L273

zhijing-jin commented 4 years ago

Hi Haoyu,

I ran the same code on my own ACE05 without too much specifically tailored data processing, but get a performance gap of 5-10%. Maybe it is the problem with data preprocessing. For example, do you remember how you dealt with "no-relation" and also "bi-directional relations" such as friendship in ACE05?

For the "no-relation" question: In the ACE05 training set, do you only use the no-relation provided by the original ACE05 dataset, or use the combination of every entity pair to generate additional no-relation instances?
For bi-directional relations: Also for ACE05, for bidirectional relationships such as friendship (because A---friendship--->B will also mean B---friendship--->A), do you auto-complete these and add the other direction into the training data as well?

helloeve / mre-in-one-pass

Data #5