TACRED Pre-processing - Githubissues

pvcastro commented 4 years ago

Hi there!

I made some adjustments to your code in order to support the json format provided by the TACRED dataset, and I had to make some adjustments in order to add the entity indicators as well. Unfortunatelly, after doing this, I ended up with results far worse than the ones you reported. Even though I was able to get 85% f1/accuracy with the sklearn metrics, the results from the TACRED scorer were pretty bad, giving me only 0.9% F1.

Since the tsv you used aren't provided by TACRED, do you mind sharing the preprocessing you apparently did to the dataset in order to add the indicators? So I can try replicating your exact reported results for this benchmark.

Thanks!

mickeysjm commented 4 years ago

The preprocessing script for converting original json TACRED data to the tsv formant is: https://github.com/mickeystroller/R-BERT/blob/master/generate_tacred_tsv.py

mickeysjm commented 4 years ago

Hi @pvcastro ,

Since you mentioned that you have preprocessed the TACRED dataset, I assume you have already purchased that dataset on LDC? If that is the case, I might just share you with my preprocessed dataset. You can give me an email and I can send that data link privately to you.

pvcastro commented 4 years ago

Indeed, I did pay for the dataset. Can you send it to pvcastro@gmail.com please? Thanks!

tyistyler commented 4 years ago

Excuse me, I buy the dataset in LDC, could you sent one to tyleristian@gmail.com, I can show the order to you.Thanks! @mickeystroller

mickeysjm commented 4 years ago

@tyistyler, no problem, can you send me your order copy (screenshot will suffice) to mickeysjm@gmail.com and I will send you my preprocessed dataset. Thanks.

mickeysjm / R-BERT

TACRED Pre-processing #2