zjunlp / OneGen

[EMNLP 2024 Findings] OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs.
MIT License
126 stars 15 forks source link

how to get EntityLinking easily from EntityLinking model? #5

Closed thealmight1 closed 3 weeks ago

thealmight1 commented 3 weeks ago

i saw the code where use model to get EntityLinking.it maybe need many parameters,such as test_file.the parameter test_file seems need a lot of information,including candidate_list doc_embedding and doc_embedding_label.so i want to know whether the model can use only one input which is a sentence watting to get Entity candidates?

MikeDean2367 commented 3 weeks ago

Hi, thank you for your attention!

It seems like you are asking about how to handle a situation where you are given one prompt and some candidate descriptions. To address this, you can start by instantiating a class EntityLinkingEvaluator and providing it with the ComponentConfig and a dict as input. Then, you can execute the run_single method to obtain the result. Here is a pseudocode example:

# 1. import some class
...

# 2. set your input
sentence = "your sentence"
candidate_list: List[str] = [
    "Represent the entity based on its description and output it after the '[SE]' token.\nThe description of SK Sturm Graz is as follows:\nSK Sturm Graz is an instance of association football club. It is described as \"association football club in Austria\". Sportklub Sturm Graz is an Austrian professional association football club, based in Graz, Styria, playing in the Austrian Football Bundesliga.[SE]",
    "Represent the entity based on its description and output it after the '[SE]' token.\nThe description of SK Sturm Graz is as follows:\nAnother Description.[SE]",
]
doc_embedding_label = [str(i) for i in range(candidate_list)]

# 3. template
prompt:str = "[INST] You are good at mention detection. Identify and extract mentions of entities from the text. Please output the original text with annotations. Here, the annotation for each mention should be formatted as `<MENTION>{{mention in text}}</MENTION>` when displayed. \n\nTEST TEXT:\n{input} [/INST]".format(input=sentence)

# 4. instance the ComponentCOnfig
generator_config: ComponentConfig
...

# 5. instance the EntityLinkingEvaluator
evaluator = EntityLinkingEvaluator(generator_config, rules=["</MENTION>[LK]", "[_CONTINUE_]"])

# 6. run a case
results:dict = evaluator.run_single(
    prompt=prompt,
    candidate_list=candidate_list, 
    max_new_tokens=1024,
    input_ids=None,
    embed_batch_size=16,
    skip_repr_token_cnt=1,
    doc_embedding_label=doc_embedding_label
)
"""
results = {
    "output": "",
    "output_qid": []
}
"""

If you have any further questions, feel free to reach out to me :)

zxlzr commented 3 weeks ago

hi, do you have any further questions?