run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
2.67k stars 260 forks source link

Python API does not allow setting multiple languages but doc says one can. #312

Open stonesthatwhisper opened 2 months ago

stonesthatwhisper commented 2 months ago

related #245

Document says

Set language LlamaParse use OCR to extract text from images. Our OCR supports a long list of languages and you can tell LlamaParse which language(s) to parse for by setting this option. You can specify multiple languages by separating them with a comma. This will only affect text extracted from images.

However I am getting this error with python:

File [~/miniconda3/envs/llm/lib/python3.10/site-packages/pydantic/v1/main.py:341](http://localhost:8888/lab/tree/~/miniconda3/envs/llm/lib/python3.10/site-packages/pydantic/v1/main.py#line=340), in BaseModel.__init__(__pydantic_self__, **data)
    339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
--> 341     raise validation_error
    342 try:
    343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 1 validation error for LlamaParse
language
  value is not a valid enumeration member; permitted: 'abq', 'ady', 'af', 'ang', 'ar', 'as', 'ava', 'az', 'be', 'bg', 'bh', 'bho', 'bn', 'bs', 'ch_sim', 'ch_tra', 'che', 'cs', 'cy', 'da', 'dar', 'de', 'en', 'es', 'et', 'fa', 'fr', 'ga', 'gom', 'hi', 'hr', 'hu', 'id', 'inh', 'is', 'it', 'ja', 'kbd', 'kn', 'ko', 'ku', 'la', 'lbe', 'lez', 'lt', 'lv', 'mah', 'mai', 'mi', 'mn', 'mr', 'ms', 'mt', 'ne', 'new', 'nl', 'no', 'oc', 'pi', 'pl', 'pt', 'ro', 'ru', 'rs_cyrillic', 'rs_latin', 'sck', 'sk', 'sl', 'sq', 'sv', 'sw', 'ta', 'tab', 'te', 'th', 'tjk', 'tl', 'tr', 'ug', 'uk', 'ur', 'uz', 'vi' (type=type_error.enum; enum_values=[<Language.BAZA: 'abq'>, <Language.ADYGHE: 'ady'>, <Language.AFRIKAANS: 'af'>, <Language.ANGIKA: 'ang'>,

Client: Please remove untested options:

galvangoh commented 2 months ago

If you are asking about simultaneous language support, it's in their roadmap but I guess we still have to wait.