Closed elonmusk-01 closed 2 years ago
Hi @elonmusk-01, LUKE is an extension of RoBERTa using entity embeddings and the subword tokenization adopted in our model is exactly the same as the one adopted in RoBERTa. We simply used the RoBERTa's tokenization implementation available in the Huggingface library in our experiments, so if the original implementation has such issue, LUKE also has the same issue. However, we do not have a detailed knowledge on the implementation of the RoBERTa tokenizer.
Hi, @ikuyamada I've great experience with luke. As I mentioned in #129 I am not asking for a solution to this problem. But needs some clarification.