Closed vikasr111 closed 7 months ago
π€
Hello @vikasr111! I'm here to help you with your coding issues. As a bot, I'll do my best to assist you while we wait for a human maintainer to join us. I'm currently reviewing your issue and will be back with a full answer shortly.
π€
Thank you for providing detailed information about the issue you're facing. It seems like the problem is related to the output format of the language model, which is causing the StructuredOutputParser
to fail.
The StructuredOutputParser
expects the output to be a Markdown code snippet formatted in a specific schema. This schema is defined by the response_schemas
attribute of the StructuredOutputParser
class, which is a list of ResponseSchema
objects. Each ResponseSchema
object has a name
, description
, and type
. The output should include the leading and trailing "json" and "
", and the content should be a JSON object where each key-value pair corresponds to a ResponseSchema
object.
From the output you provided, it seems like there are two sets of triple backticks at the end of the output, which is causing the parser to fail. This is likely due to the language model generating an extra set of backticks.
There are a few similar issues that have been solved in the LangChain repository:
These issues suggest that the problem might be related to the non-greedy matching of the regular expression in the file libs/langchain/langchain/output_parsers/json.py. The proposed solution is to change the regular expression to use greedy matching by replacing .*?
with .*
. Additionally, it's suggested to add a test case in test_json.py unit test to cover the scenario of an answer containing a code block.
Another potential solution is to modify the parse
method to check if the triple backticks are present in the text before attempting to split it. If they are not present, it directly proceeds to the AgentFinish
block.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot Thanks for the analysis but that doesn't solve my problem. Since I am using standard output parser from langchain, I am not sure how can I address this issue.
π€
Based on the context provided, it seems like the issue you're facing is related to the output parser in the LangChain framework not being able to handle the case where the output ends with two triple-backticks. This is likely causing the Structured Output Parser to throw an error.
There have been similar issues reported in the LangChain repository, such as Chat agent does not parse properly when model returns a code block, Conversational Chat Agent: json.decoder.JSONDecodeError, and LLM output parsing error for answers including code sections.. These issues suggest that the problem might be due to the way the output parser is handling the model's output.
To address this issue, you could modify the parse
method in your StructuredOutputParser
class to handle the case where the output ends with two triple-backticks. Here's an example of how you can do this:
def parse(self, text: str) -> Any:
"""Returns the input text with no changes."""
# Check if the text ends with two triple-backticks
if text.endswith("```"*2):
# Remove the two triple-backticks from the end of the text
text = text[:-6]
expected_keys = [rs.name for rs in self.response_schemas]
return parse_and_check_json_markdown(text, expected_keys)
This code checks if the text ends with two triple-backticks. If it does, it removes them from the end of the text before returning it. This should prevent the output parser from failing when it encounters this case.
Please note that this is a simple solution and might not cover all possible edge cases. Depending on the specifics of your application, you might need to implement a more sophisticated solution.
As for the changes in the latest version (v0.0.332) of LangChain that could potentially affect the output parser, I wasn't able to find any specific information in the repository. It's possible that the changes were made elsewhere or I might have missed them. If you have more information about the changes made in this version, it could help in diagnosing the issue.
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
This is not the right solution at all. If you notice the output from LLM ends with "\n\n
"
I'd appreciate if I can get some help from a human. I guess AI is still not there after all :D
@baskaryan, I could use your expertise on this one.
Hi, @vikasr111,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you raised an issue regarding errors in the structured output parser of the langchain library, particularly due to the presence of two sets of triple-backticks at the end of the output. This issue has been observed more frequently after the OpenAI dev day. Despite proposed solutions not solving the problem, it seems that the issue remains unresolved.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you for your understanding and cooperation.
System Info
langchain==0.0.330 openai==0.28.1 python==3.9.17
Who can help?
@hwchase17 @agola11
Information
Related Components
Reproduction
I have written a simple structured output parser. I am using to extract useful data from a document text. Here's my code:
When I use this code on any input data, the output parser gives error most of the time. Here's a sample input data:
\n------------------\nSTRAIGHT BILL OF LADING - SHORT FORM\nTEST\nCHEMTRADE\nFICHE D\'EXPEDITION - FORMULE REGULIERE\nS4D\nSHIPPER/EXPEDITEUR\nChemtrade West Limited Partnership\nTIME IN/ARRIVEE\nGROSS/BRUT\nCONSIGNEE/DESTINATAIRE\nSASK POWER\n TARE\nSHIP TO/EXPEDIEZ A\nCORY \nCOGENERATION STATION\nTIME OUT/DEPART\n8 KM W OF SASKATOON HWY 7\nNET\nVANSCOY SOK 1VO SK CA\nPOINT OF ORIGIN/POINT D\'EXPEDITION\nCUSTOMER ORDER NO./N DE COMMANDE DU CLIENT\nORDER NO./N DE COMM.\n3/L NO./NDE CONN.\nCHEMTRADE (SASKATOON)\nS\n1856\n80001877\n CARRIER NAME/NOM DU TRANSPORTEUR\nREQUIRED SHIP DATE/DATE EXP.DEM.\nDATE SHIPPED/EXPEDIE LE\nCARON TRANSPORT LTD\nNov 06,2023\nTRANSPORTATION MODE/MODE DE TRANSPORT\nVEHICLE T/C NO. - MARQUE DU WAGON\nTruck\n UNIVAR CANADA LTD.\n ROUTING/ITINERAIRE\nCONSIGNEE#/CONSIGNATAIRE\nPAGE\n600929\n1 of\n3\nNO.AND DESCRIPTION OF PACKS\nD.G.\nDESCRIPTION OF ARTICLES AND SPECIAL MARKS\nNET WEIGHT KG\nNBRE ET DESCRIPTION DE COLIS\nDESCRIPTION DES ARTICLES ET INDICATIONS SPECIALS\nPOIDS NET\n1 TT\nX\nUN1830, SULFURIC ACID, 8, PG II\n21.000 Tonne\nSULFURIC ACID 93%\nER GUIDE #137\n4 PLACARDS REQUIRED; CLASS 8, CORROSIVE\nSTCC 4930040\nSulfuric Acid 93%\nCOA W/ SHIPMENT\nDELIVERY HOURS: 8:OOAM-1: OOPM MON-THURS\nATTENDANCE DURING OFFLOAD REQUIRED\nSAFETY GOGGLES, FACE SHIELD, GLOVES, BOOTS, HARD\nHAT, STEEL TOED SHOES, PROTECTIVE SUIT\n3" QUICK CONNECT CAMLOCK; 1 HOSE REQUIRED\nPersonal Protective Equipment: Gloves. Protective clothing. Protective goggles. Face shield.\nnsufficient ventilation: wear respiratory protection.\nERP 2-1564 and Chemtrade Logistics 24-Hour Number >>\n1-866-416-4404\nPIU 2-1564 et Chemtrade Logistics Numero de 24 heures >>\n1-866-416-4404\nConsignor / Expediteur:\nLocation / Endroit:\nCHEMTRADE WEST LIMITED PARTNERSHIP\n11TH STREET WEST\nI hereby declare that the contents of this consignment are fully and accurately described above by the proper shipping\nSASKATOON SK CA\nare in all respects in proper condition for transport according to the Transportation of Dangerous Goods Regulations.\nS7K 4C8\nPer/Par:Michael Rumble, EHS Director, Risk Management\nIF CHARGES ARE TO BE PREPAID, WRITE OR STAMP\nJe declare que le contenu de ce chargement est decrit ci-dessus de faconcomplete et exacte par Iappellation reglementaire\nINDIQUER ICI SI L\'ENVOI SE FAIT EN "PORT-PAYE"\negards bien conditionne pouretre transporte conformement au Reglement sur le transport des marchandises dangereuses.\nPrepaid\nFORWARD INVOICE FOR PREPAID FREIGHT\nChemtrade West Limited Partnership\nQUOTING OUR B/L NO.TO:\n155 Gordon\nBaker Rd #300\nWeight Agreement\nFAIRE SUIVRE FACTURE POUR EXPEDITION PORT\nToronto,\nOnt.\nM2H 3N5\nPAYE EN REFERANT A NOTRE NUMERO DE CONN.A:\nSHIPPER\nChemtrade West Limited\nAGENT\nCONSIGNEE.\nEXPEDITEUR\nPartnership\nDESTINATAIRE\nPER\nPERMANENT POST OFFICE ADDRESS OF SHIPPER\nPER\nPER\nPAR\n(ADRESSE POSTALE PERMANENTE DE L\'EXPEDITEUR)\nTHESE PRODUCTS ARE SOLD AND SHIPPED IN ACCORDANCE WITH\nTHE TERMS OF SALES ON THE REVERSE SIDE OF THIS,DOCUMENT.\nResponsible Care\nCES PRODUITS SONT VENDUS ET EXPEDIES CONFORMEMENTAUX\nCONDITIONS DE VENTE APPARAISSANT AU VERSO DE LA PRESENTE\nOur commitment to sustainability.\nS4D PRASRNov 06,2023 1618
Upon further debugging I found that for some reason the output has two triple-backticks at the end and because of this the Structured Output Parser ends up giving the error. Here the output for better clarity (Notice the end of output):
content='```json\n{\n\t"document_type": "STRAIGHT BILL OF LADING - SHORT FORM",\n\t"shipper": "Chemtrade West Limited Partnership",\n\t"consignee": "SASK POWER",\n\t"point_of_origin": "VANSCOY SOK 1VO SK CA",\n\t"customer_order_number": "80001877",\n\t"order_number": "1856",\n\t"bill_of_lading": "600929",\n\t"carrier_name": "CARON TRANSPORT LTD",\n\t"required_ship_date": "Nov 06,2023",\n\t"shipped_date": "Nov 06,2023",\n\t"transportation_mode": "Truck",\n\t"vehicle_number": "T/C NO.",\n\t"routing_info": "UNIVAR CANADA LTD.",\n\t"invoice_to_buyer": "Chemtrade West Limited Partnership",\n\t"consignee_number": "600929",\n\t"net_weight": "21.000 Tonne",\n\t"ticket_number": "N/A",\n\t"outbound_date": "N/A"\n}\n```\n```'
I have started to notice this error at high-frequency after OpenAI dev day. Any idea what I might be doing wrong?
Expected behavior
The output should only have one triple-backticks at the end and the output parser should parse the output properly.