[Extraction template] NER

rlancemartin commented 1 year ago

Feature request

Name Entity Recognition is the process of identifying and classifying named entities (like people, organizations, dates, and locations) in text.

Motivation

NER is useful in various areas:

Information Retrieval: Helps in improving the efficiency of search engines. For example, if someone searches for "Apple," understanding whether they mean the fruit or the tech company can be crucial.
Content Recommendation: For news or media agencies, knowing what entities are mentioned in articles can help in recommending related content to users.
Data Mining: For extracting structured information from unstructured datasets.
Question Answering: Helps chatbots and virtual assistants understand and respond to user queries more accurately.
Knowledge Graph Construction: Crucial for building structured databases of information extracted from text.

Your contribution

Add template

harell commented 10 months ago

Until we get an offical solution, here is the prompt I am using:

Identify and extract all named entities from the following text, including:

Person: Names of people, including full names, titles, and roles (e.g., "Mateo Gomez", "Nickolaus Schulz", "caretaker")
Organization: Names of companies, institutions, and groups (e.g., "Contoso General Hospital", "Contoso Restaurant")
Location: Geographical locations, addresses, and landmarks (e.g., "Hollywood Boulevard", "Los Angeles", "Santa Monica Pier")
Date/Time: Specific dates, times, durations, and expressions of time (e.g., "August 17th, 2022", "7:45 PM", "last Thursday")
Quantity: Numerical values, including numbers, ordinals, and percentages (e.g., "28-year-old", "first", "95 percent")
Product: Names of products, objects, and items (e.g., "Stent", "Chicken parmigiana", "Surf and turf platter")
Event: References to specific events or activities (e.g., "car accident", "trip", "anniversary")
Skill: Mentions of skills or abilities (e.g., "supervision")
Other categories: Consider any other relevant entity types that may appear in the text.
For each entity, provide:

Text: The exact text of the entity as it appears in the text.
Category: The category of the entity (e.g., Person, Location, Date/Time).
Subcategory (optional): A more specific subcategory, if applicable (e.g., GPE for geo-political entities, Date for dates, Time for times).
Offset: The starting character index of the entity within the text.
Length: The number of characters in the entity.
Confidence score: A numerical value indicating the model's confidence in the entity extraction (e.g., 0.0 to 1.0).
Format the results as a JSON object that adheres to the following structure:

JSON
{
    "documents": [
        {
            "id": "id__1234",  // Unique document identifier
            "entities": [
                {
                    "text": "Entity text",
                    "category": "Entity category",
                    "subcategory": "Entity subcategory (optional)",
                    "offset": 0,
                    "length": 10,
                    "confidenceScore": 0.95
                },
                // ... other entities
            ]
        }
    ],
    "errors": []  // List of any errors encountered during extraction
}

And here is an example, implemented in ChatGPT:

"Michelina Metelli was born in Farneto, Papal States, to a wealthy Italian family. She married into the noble Malatesta family at the age of 12 and was widowed by age 20. She led a lifestyle of parties and luxury but, after the death of her only son, she experienced a vision of him in heaven, and decided to become a member of the Third Order of Saint Francis.

She proceeded to give away all her belongings and property, and founded, together with Francis Zanferdini, the Confraternity of the Annunciation to care for the poor, nurse the sick and bury the dead. Initially her family believed her to be insane and had her locked up. Upon her release from confinement, she made a pilgrimage to the Holy Land as penance for her sins. It is believed that she received the stigmata in the course of this journey. Michelina died at her home in Pesaro on 19 June 1356 of natural causes"

Result:

    "documents": [
        {
            "id": "id__1234",
            "entities": [
                {
                    "text": "Michelina Metelli",
                    "category": "Person",
                    "offset": 0,
                    "length": 16,
                    "confidenceScore": 0.98
                },
                {
                    "text": "Farneto, Papal States",
                    "category": "Location",
                    "offset": 26,
                    "length": 20,
                    "confidenceScore": 0.95
                },
                {
                    "text": "Malatesta",
                    "category": "Organization",
                    "offset": 80,
                    "length": 9,
                    "confidenceScore": 0.92
                },
                {
                    "text": "12",
                    "category": "Quantity",
                    "offset": 114,
                    "length": 2,
                    "confidenceScore": 0.99
                },
                {
                    "text": "20",
                    "category": "Quantity",
                    "offset": 142,
                    "length": 2,
                    "confidenceScore": 0.99
                },
                {
                    "text": "Third Order of Saint Francis",
                    "category": "Organization",
                    "offset": 255,
                    "length": 28,
                    "confidenceScore": 0.94
                },
                {
                    "text": "Francis Zanferdini",
                    "category": "Person",
                    "offset": 330,
                    "length": 18,
                    "confidenceScore": 0.97
                },
                {
                    "text": "Confraternity of the Annunciation",
                    "category": "Organization",
                    "offset": 361,
                    "length": 33,
                    "confidenceScore": 0.93
                },
                {
                    "text": "Holy Land",
                    "category": "Location",
                    "offset": 489,
                    "length": 9,
                    "confidenceScore": 0.96
                },
                {
                    "text": "Pesaro",
                    "category": "Location",
                    "offset": 568,
                    "length": 6,
                    "confidenceScore": 0.97
                },
                {
                    "text": "19 June 1356",
                    "category": "Date/Time",
                    "subcategory": "Date",
                    "offset": 587,
                    "length": 12,
                    "confidenceScore": 0.99
                }
            ]
        }
    ],
    "errors": []
}

eyurtsev commented 8 months ago

Please see https://python.langchain.com/docs/use_cases/extraction which can be adapted for more specific NER extraction.

If you need to localize the extraction in the text there are different ways to do it (just don't ask the LLM to count positions since it doesn't do it well.)

One way to do it: Add "evidence" to the schema to request text repeated in verbatim as evidence for the extraction and then approximate algorithms for aligning the extracted text against the original

langchain-ai / langchain