xdqkid / S2S-AMR-Parser

Improving AMR parsing with Sequence-to-Sequence Pre-training
42 stars 4 forks source link

Can only get 80.8 on AMR 2.0 #1

Open headacheboy opened 3 years ago

headacheboy commented 3 years ago

Hi

When parsing on AMR 2.0 with the model PTM-MT(WMT14B)-SemPar(WMT14M), we can only get 80.8 instead of 81.4 as you mentioned in your paper.

We're wondering whether there is anything important when preprocessing and postprocessing on AMR 2.0? Could you provide more details about it?

Thank you!

xdqkid commented 3 years ago

Hello,

We did not do any additional operations in pre-processing and post-processing. Notice that:

  1. We use the source sentence in "amrs" in AMR2.0 instead of that in "alignment". The implementation of Tokenization in alignment is more complicated. Here we use AllenNLP to handle the Tokenization.
  2. Tokenization and BPE are required for source sentences.
  3. The sentences used in post-processing have not been processed by Tokenization and BPE. See here, we use "sent" rather than "sent.tok" or "sent.tok.bpe".
python2 postprocess_AMRs.py -f sent.amr -s sent

Good Luck! xdqkid

headacheboy commented 3 years ago

Hi,

When postprocessing, I use the latest version of RikVN/AMR, and the api to get wiki label has been changed. I'm wondering whether these decrease the performance of model...

xdqkid commented 3 years ago

Hi, Thank you for your reminder! When we do post-processing, we do some processing on the wiki. We store the <name, wiki> dictionary in the training set and prefer to use this dictionary rather than default wiki get_wiki_from_spotlight_by_name. This may have some influence on the result.

I'm sorry that it has been a long time since I modified the wiki, and I almost forgot my modification .

BTW,

  1. We also try to do some wiki used in amr_2.0_utils of STOG. It improve 3.2 point in Wiki and less than 0.1 point in final Smatch F1. We did not adopt this method in the end. So I think wiki may not be the main reason.
  2. Delete sent.amr.*, like sent.amr.pruned.wiki.coref all, and Do Post-Processing again. It seems post-processing also cause float.
headacheboy commented 3 years ago

Hi,

I get 81.0 now. But I am unable to get the best result 81.4...

Could you provide your modification of post-processing and the <name, wiki> dictionary on AMR2.0?

Thank you!

xdqkid commented 3 years ago

Hi,

Thanks for your advice. I'm too busy recently, e.g. looking for a job, preparing for dissertation, building websites for CCMT2020&AACL2020. I will probably find a time to push an update that contains full codes later this or next year .

Cheers