studio-ousia / luke

LUKE -- Language Understanding with Knowledge-based Embeddings
Apache License 2.0
705 stars 101 forks source link

When running multilingual LUKE an error is reported #110

Closed ScottishFold007 closed 2 years ago

ScottishFold007 commented 2 years ago

Hello, when I run multilingual LUKE, the following error is reported, how can I fix this bug? image It's the code:

`# Load the model checkpoint

model_name = "studio-ousia/luke-base"

model_name = 'studio-ousia/luke-large-finetuned-conll-2003'

model_name = "studio-ousia/mluke-base"

model = LukeForEntitySpanClassification.from_pretrained(model_name)

model = LukeForEntitySpanClassification.from_pretrained(model_name) model.eval() model.to("cuda")

Load the tokenizer

tokenizer = LukeTokenizer.from_pretrained(model_name)

tokenizer = MLukeTokenizer.from_pretrained(model_name)`

`text = "Beyoncé lives in Los Angeles"

List all possible entity spans in the text

word_start_positions = [0, 8, 14, 17, 21] # character-based start positions of word tokens word_end_positions = [7, 13, 16, 20, 28] # character-based end positions of word tokens entity_spans = [] for i, start_pos in enumerate(word_start_positions): for end_pos in word_end_positions[i:]: entity_spans.append((start_pos, end_pos))

inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt").to("cuda") outputs = model(**inputs) logits = outputs.logits predicted_class_indices = logits.argmax(-1).squeeze().tolist() for span, predicted_class_idx in zip(entity_spans, predicted_class_indices): if predicted_class_idx != 0: print(text[span[0] : span[1]], model.config.id2label[predicted_class_idx])`

ryokan0123 commented 2 years ago

Hi, our code has not supported using mLUKE for downstream tasks. We will release example code of using mLUKE for downstream tasks including NER, and also will look into this use case, so please wait for a while.