run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
3.19k stars 312 forks source link

LLAMA_PARSE PERFORM BADLY IN THAI LANGUAGE!? or it's my mistake? #401

Open FRAMEEE17 opened 2 months ago

FRAMEEE17 commented 2 months ago
image
def process_pdf_files(index):
    pdf_directory = "./data/documents"
    llama_parse = LlamaParse(result_type="markdown")

    for filename in os.listdir(pdf_directory):
        if filename.endswith(".pdf"):
            file_path = os.path.join(pdf_directory, filename)

            try:
                # Parse the PDF file
                documents = llama_parse.load_data(file_path)

                # Upload the parsed content to LlamaCloud
                for doc in documents:
                    index.insert(doc)

                logger.info(f"Uploaded: {filename}")
            except Exception as e:
                logger.error(f"Error processing {filename}: {str(e)}")

            # Add a small delay between files to avoid rate limiting
            time.sleep(1)

try to parse Thai pdf files and it doesn't work! It's my first time to use LLAMA_PARSE though.

hexapode commented 2 months ago

Could you share the doc you used with us so we can have a look? likely a font encoding issue given the screenshot you share