run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.44k stars 731 forks source link

Fixing JsonReader and introducing unit tests #816

Closed godwin3737 closed 8 months ago

godwin3737 commented 8 months ago

Description

The JsonReader is currently not working. It only considers the Keys and ignores the content in Value. This is fixed in the PR, and also Unit tests are introduced to validate any further changes.

Fixes # (issue) Related to: https://github.com/run-llama/llama-hub/pull/485

Issue:

  1. Change was done in PR (485) to accommodate a schema that seems to be not json syntax compliant.

json-syntax-incorrect

  1. For syntactically accurate Json the actual value is missing and instead only the keys are present in the document Refer https://github.com/run-llama/llama-hub/issues/606 .

  2. The line causing the issue (only dictionary keys are returned): https://github.com/run-llama/llama-hub/blob/c36cdc54b82ced1bffe792293d896ca5681a2e61/llama_hub/file/json/base.py#L79

Solution

This is a PR to fix this, with syntactically accurate JSON retaining its original behavior of being passed entirely as single document. If multiple documents are required then a list of JSONobjects or JSONL needs to be passed.

Type of Change

Please delete options that are not relevant.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Suggested Checklist:

nerdai commented 8 months ago

Dope thanks @godwin3737 !!!