run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
1.79k stars 157 forks source link

How to exclude strikethrough text? #256

Open SomebodySysop opened 5 days ago

SomebodySysop commented 5 days ago

Tried the online preview. Kept the default settings, gave this instruction:

Extract the text in markdown format. Include all tables. Do not include page numbers, page headers or page footers. Also do not include any strikethrough text. Strikethrough text will be any text letters with lines through them.

Uploaded this file: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/2022_Local_161_MOA_09.pdf

System returned the following (strikethrough text included). Is there a way to get it to recognize strikethrough text in PDF?

2022_Local_161_MOA_09.txt