Open quissuiven opened 6 months ago
@quissuiven Is this still a problem? Can you share some sample code to reproduce?
HI @3coins, yes it's still a problem. Here's the sample code, I'm running this in Sagemaker studio:
!pip install -q langchain kaleido pypdf pydantic langchain-community langchain-core
!pip install -q langchain_aws
!pip install --no-build-isolation --force-reinstall \
"boto3>=1.28.57" \
"awscli>=1.29.57" \
"botocore>=1.31.57" \
"requests" \
"defusedxml"
import boto3
import json
import time
from io import BytesIO
from datetime import datetime
import dateutil.parser
import os
import pypdf
import re
from langchain import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.llms import HuggingFacePipeline, Bedrock
from langchain.schema import BaseOutputParser, StrOutputParser
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser
from langchain.schema import OutputParserException, BaseOutputParser, StrOutputParser
from typing import List, Dict, Tuple
from langchain.schema.runnable import RunnablePassthrough, RunnableParallel, RunnableLambda
from langchain.pydantic_v1 import BaseModel, Field, validator
from langchain_aws import BedrockLLM, ChatBedrock
from langchain_core.prompts import MessagesPlaceholder
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
AIMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage
)
model = ChatBedrock(
model_id = "anthropic.claude-3-sonnet-20240229-v1:0",
model_kwargs={"temperature": 0}
)
def extract_pii_entities_with_reflection(resume_text):
#EXTRACTION
system_prompt_pii_masking = """
You are a specialist focused on extracting personal identifying information from resumes.
Your job is to extract all personally identifying information from a resume. You respond only in valid JSON format.
Here is your task:
1. Read the candidate's resume text.
2. Extract all personally identifying information matching the following template definition:
person_name (list all people names in the text)
physical_address (Use your advanced geopolitical knowledge to list all physical addresses in the text. This refers to only full addresses and excludes cities, states and countries.)
phone_number (list all phone numbers in the text)
email_address (list all email addresses in the text)
url (list all URLs in the text)
date_of_birth (list all dates of birth in the text)
personal_identification_id (list all personal identification id in the text)
Only extract information from the text, do not make up any information.
Put the output in <response></response> XML tags.
"""
human_prompt_pii_masking = "Here is the resume text: {TEXT}"
def clean_response(response_message):
response_str = response_message.content
final_str = response_str.replace('<response>','')
final_str = final_str.replace('</response>','')
return final_str
extractor_messages = ChatPromptTemplate.from_messages([("system", system_prompt_pii_masking),
MessagesPlaceholder(variable_name="messages")])
runnable_extraction = extractor_messages | model | RunnableLambda(clean_response)
query = human_prompt_pii_masking.format(TEXT=resume_text)
request = HumanMessage(content = query)
result_dict_extraction = runnable_extraction.invoke({"messages":[request]})
#REFLECTION
reflection_prompt = """
You are tasked with evaluating personally identifying information extracted from a text. Here are your responsibilities:
- Check all relevant personally identifying information have been extracted
- All extracted information are present in the original text
Your Feedback Protocol:
- If suggesting modifications, include the specific segment and your recommendations.
- If no modifications are necessary, respond with "Output looks correct. Please return the original output in the same format."
"""
reflector_messages = ChatPromptTemplate.from_messages(
[("system",reflection_prompt),
MessagesPlaceholder(variable_name="messages")]
)
runnable_reflection = reflector_messages | model
human_prompt_reflection = human_prompt_pii_masking.format(TEXT=resume_text)
result_reflection = runnable_reflection.invoke({"messages": [HumanMessage(content = human_prompt_reflection), AIMessage(content = str(eval(result_dict_extraction)))]})
#REFINED EXTRACTION
message_1 = HumanMessage(content = human_prompt_reflection)
message_2 = AIMessage(content = str(eval(result_dict_extraction)))
message_3 = HumanMessage(content = result_reflection.content)
runnable_extraction.invoke({"messages":[message_1, message_2, message_3]})
return runnable_extraction.invoke({"messages":[message_1, message_2, message_3]})
results_list_with_reflection = []
for index, resume_text in enumerate(resume_extracted_list):
print(f"Performing extraction for Resume {index+1}")
results_dict_reflection = extract_pii_entities_with_reflection(resume_text)
print(results_dict_reflection)
print("\n")
Hi, I'm currently using ChatBedrock(model_id = "anthropic.claude-3-sonnet-20240229-v1:0"). I'm implementing a reflection workflow for PII extraction, with 1 prompt for the extractor and 1 prompt for the reflector. There are 3 invocations where the first invocation extracts PII from a resume, the second invocation critiques the output, followed by the third invocation refining the output.
Currently, I'm noticing a List out of Index error for several cases:
This error did not appear when I was using langchain.llms.Bedrock. I presume this error happens only for chat models, when the library is trying to prepare the output in the form of an AIMessage but failed to do so. Does anyone know how to resolve this issue?