run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.88k stars 5.09k forks source link

[Question]: AzStorageBlobReader Integration with LlamaParse #14395

Closed galvangoh closed 3 days ago

galvangoh commented 3 months ago

Question Validation

Question

I am currently using LlamaParse as a file_extractor together with AzStorageBlobReader to read PDFs in blob storage and parse the document at the same time. Currently, this integration only allows .load_data() that outputs the parsing result either as text or markdown. Is there a way to also output the parsing result in JSON - like same as get_json_result() as mentioned in this issue from LlamaParse?

Thank you.

dosubot[bot] commented 3 months ago

Yes, there is a way to output the parsing result in JSON using the get_json_result() method when using LlamaParse as a file_extractor together with AzStorageBlobReader to read PDFs in blob storage. The get_json_result() method is defined in the DashScopeParse class, which is part of the LlamaIndex framework.

Here is the relevant code snippet for the get_json_result() method:

def get_json_result(
        self, file_path: Union[List[str], str], extra_info: Optional[dict] = None
    ) -> List[dict]:
        extra_info = {"parse_fmt_type": ResultType.DASHSCOPE_DOCMIND.value}
        """Parse the input path."""
        try:
            return asyncio.run(self.aget_json(file_path, extra_info))
        except RuntimeError as e:
            if nest_asyncio_err in str(e):
                raise RuntimeError(nest_asyncio_msg)
            else:
                raise

This method can be used to parse the input file path and return the result in JSON format [1].

To continue talking to Dosu, mention @dosu.