Open TrickyyH opened 1 year ago
Hi! If you refer to the following blog, it seems that offset_mapping can be used with LUKE. It has not been confirmed whether misalignment does not occur at any time. sorry
I thought the same as @TrickyyH. Apart from offset_mapping, for instance, the behaviour of return_overflowing_tokens differs between slow and fast tokenisers. As a result, it becomes difficult to handle long texts in tasks like NER and QA, which LUKE excels at. I would be pleased if you could accommodate the fast tokeniser.
One possible workaround is to use the fast version of the base tokenizer, such as the Fast version of RobertaTokenizer
, which LukeTokenizer
is based on 'they have the same subword vocabulary).
However, this approach may not support entity-related outputs, which would require additional code to be written.
Hello everyone, I am tring to use luke-large for question answering. I met serveral issues when finetune the model by SQAUD-like data, most of the issues comes by not supporting fast tokenizer. So I am wondering if luke will support fast tokenizer in the future, or is any ways to solve the issues. Thank you so much!