run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
1.79k stars 157 forks source link

GPT-4o Enablement Can Result in Hyperlink Hallucinations #237

Open adreichert opened 2 weeks ago

adreichert commented 2 weeks ago

Summary

When GPT-4o mode is enabled LlamaParse "transforms the document into an image per page and uses OpenAI GPT-4o to convert it into Markdown." When no custom instruction is used, this causes the incorrect handling of hyperlinks:

We observed the following issues.

Workaround

In testing, adding instructions "Don't render Markdown links" prevented links from appearing in the parsed markdown.

Example

I parsed following PDF using the Web UI at https://cloud.llamaindex.ai/parse. demo.pdf

In the original PDF, the last word of this sentence is a link.

Underlined text is turned into links.

In the original PDF, the last word of this sentence is a link.

Underlined text is turned into links.