Closed jpWang closed 3 years ago
And a further question is that how to reproduce the multitask fine-tuning results on XFUN (fine-tuning on 8 languages all, testing on X) ? I trained LayoutXLM-base for 8000 steps, and the result on Chinese is:
Hi I've the same issues for both tasks and especially when training with all languages : #440 I've tried with various numbers of steps and LR, but still fail to reproduce
@jpWang : did you succeed in reproducing the results?
@DRRV we were busy for other stuffs recently, will follow-up next week.
No hurry on my side. thanks!
@jpWang : did you succeed in reproducing the results?
I have reproduced the results on SER task, but not on RE task so far. So I think I will try more optimization strategies on RE and closed this issue.
The reason why I didn't reproduce it before is that there are some bugs in the dataset construction part of my code. Finally I changed the following lines in xfun.py:
tokenized_inputs = self.tokenizer(
line["text"],
add_special_tokens=False,
return_offsets_mapping=True,
return_attention_mask=False,
)
into :
if '/en' in filepath[0]:
tokenized_inputs = self.tokenizer(
' '.join([q['text'] for q in line['words']]),
add_special_tokens=False,
return_offsets_mapping=True,
return_attention_mask=False,
)
else:
tokenized_inputs = self.tokenizer(
line["text"],
add_special_tokens=False,
return_offsets_mapping=True,
return_attention_mask=False,
)
after converting FUNSD into XFUN format, since there are some missing words in line['words']
compared with line["text"]
.
I just follow the official optimization strategy on SER task to reproduce the zero-shot transfer results, and change the max steps to 8000 to reproduce the multitask fine-tuning results.
@jpWang How did you convert FUNSD into XFUN format? I tried to convert but didn't succeed.
@jpWang How did you convert FUNSD into XFUN format? I tried to convert but didn't succeed.
You can access the data from https://github.com/jpWang/LiLT#datasets.
@jpWang : did you succeed in reproducing the results?
I have reproduced the results on SER task, but not on RE task so far. So I think I will try more optimization strategies on RE and closed this issue.
The reason why I didn't reproduce it before is that there are some bugs in the dataset construction part of my code. Finally I changed the following lines in xfun.py:
tokenized_inputs = self.tokenizer( line["text"], add_special_tokens=False, return_offsets_mapping=True, return_attention_mask=False, )
into :
if '/en' in filepath[0]: tokenized_inputs = self.tokenizer( ' '.join([q['text'] for q in line['words']]), add_special_tokens=False, return_offsets_mapping=True, return_attention_mask=False, ) else: tokenized_inputs = self.tokenizer( line["text"], add_special_tokens=False, return_offsets_mapping=True, return_attention_mask=False, )
after converting FUNSD into XFUN format, since there are some missing words in
line['words']
compared withline["text"]
. I just follow the official optimization strategy on SER task to reproduce the zero-shot transfer results, and change the max steps to 8000 to reproduce the multitask fine-tuning results.
@jpWang Same here. I still can't reproduce on RE task so far (I make it on SER). Do you find any more optimization strategies on RE later? Thanks in advance.
Thanks for your excellent work - LayoutXLM. And I wonder ask that how to reproduce the zero-shot transfer results of LayoutXLM on XFUN.
I have converted FUNSD into XFUN format and trained LayoutXLM-base following https://github.com/microsoft/unilm/tree/master/layoutxlm#fine-tuning-for-semantic-entity-recognition.
Then, I test the fine-tuned model on Chinese, but the result is:
Thanks again for your time and patience.