promplate / partial-json-parser-js

Parse partial JSON generated by LLM
https://npmjs.com/partial-json
MIT License
76 stars 2 forks source link

Add `OUTERMOST_OBJ` and `OUTERMOST_ARR` allowances #9

Open alanpog opened 2 months ago

alanpog commented 2 months ago

Not sure if relevant for the general library here, but sending this PR in case you find it useful (it was needed for our specific use case). These changes intend to allow partials for the outermost objects and arrays (i.e., objects that don't have another object up in the json hierarchy and similarly for arrays), but disallow it for any non-outermost objects or arrays (avoiding the words root and non-nested as their definitions are usually slightly different). The README.md has some explanation and there are quite a few test cases too. I couldn't find a .prettierrc in the repo and ended up unintentionally messing up with the formatting :(

Summary by Sourcery

Add new allowances OUTERMOST_OBJ and OUTERMOST_ARR to support partial parsing of outermost JSON objects and arrays. Enhance the JSON parsing logic to track depth and improve handling of partial structures. Update documentation to reflect these changes.

New Features:

Enhancements:

Documentation:

stackblitz[bot] commented 2 months ago

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

vercel[bot] commented 2 months ago

The latest updates on your projects. Learn more about Vercel for Git ā†—ļøŽ

Name Status Preview Comments Updated (UTC)
partial-json-coverage āœ… Ready (Inspect) Visit Preview šŸ’¬ Add feedback Sep 17, 2024 1:00pm
sourcery-ai[bot] commented 2 months ago

Reviewer's Guide by Sourcery

This pull request adds new allowances for parsing outermost objects and arrays in partial JSON, while maintaining stricter parsing for nested elements. It introduces OUTERMOST_OBJ and OUTERMOST_ARR flags, updates the parsing logic to handle these new allowances, and improves error handling and partial parsing capabilities.

File-Level Changes

Change Details Files
Added new allowances for outermost objects and arrays
  • Introduced OUTERMOST_OBJ and OUTERMOST_ARR flags in options.ts
  • Updated Allow object to include new flags
  • Added explanations for new allowances in README.md
src/options.ts
README.md
Updated parsing logic to handle new allowances
  • Implemented depth tracking for objects and arrays
  • Modified parseObj and parseArr functions to check for outermost elements
  • Updated error handling to consider new allowances
src/index.ts
Improved error handling and partial parsing capabilities
  • Enhanced parseStr function to handle more partial string scenarios
  • Updated parseNum function for better partial number parsing
  • Improved overall error messages and partial JSON handling
src/index.ts
Code refactoring and formatting improvements
  • Restructured parseAny function for better readability
  • Updated variable names and comments for clarity
  • Applied consistent code formatting throughout the files
src/index.ts
src/options.ts

Tips - Trigger a new Sourcery review by commenting `@sourcery-ai review` on the pull request. - Continue your discussion with Sourcery by replying directly to review comments. - You can change your review settings at any time by accessing your [dashboard](https://app.sourcery.ai): - Enable or disable the Sourcery-generated pull request summary or reviewer's guide; - Change the review language; - You can always [contact us](mailto:support@sourcery.ai) if you have any questions or feedback.
CNSeniorious000 commented 2 months ago

Thanks @alanpog! I think this is useful at least myself. I once had use cases where I needed to disallow incomplete child containers(object/array) while allowing the top-level container to be partial. At that time I resolved array-related problems by simply dropping the last item or something else, and resolved object-related ones by using zod to validate them.

Your proposal is great and this implementation works well. But due to my perfectionist tendencies, I often find myself wondering if there's an even better way to solve these kinds of problems. For example:

  1. If we support allowing the outermost container to be incomplete. What if we want to allow the two outer layers to also be incomplete?
  2. What If the user wants a value of a specific key to be allowed to be incomplete but not other keys?

It seems quite impossible to achieve this through a single allow flag.

So I think this implementation is still not perfect enough.


By the way, the diff is too large to review. That's my fault not to include a prettier config and I've added it just now.

CNSeniorious000 commented 2 months ago

I have some ideas about supporting these complex allow strategies elegantly, but I'm not sure:

Approach One ā€” Input Predicate Function

The parse will take an options object to extend configuration besides simple allow. We may support inputing a validate predicate function to choose whether to allow an object / array. The signature of options may look like this:

interface options {
  allow: number;
  validate(parsed: any, text: string, parents: Parent[]): boolean
}

type Parent = {
  type: "OBJ";
  key: string;
} | {
  type: "ARR";
  index: number;
}

For example, if user do this:

parse(`[0, {"a": [{`, { allow: ALL, (parsed, text, parents) => { ... } })

The validate function will be called at most 4 times with

validate({}, '{', [{ type: "ARR", index: 0}, { type: "OBJ", key: "a" }, { type: "ARR", index: 1}])
validate([{}], '[{', [{ type: "OBJ", key: "a" }, { type: "ARR", index: 1}])
validate({"a": {}}, '{"a": [{', [{ type: "ARR", index: 1}])
validate([0, {"a": {}}], '[0, {"a": [{', [])

Approach Two ā€” Return PartialInfo

Inject some information into the return value. Like this:

export const partial = Symbol('__partial__');

And the partial information may be like this:

interface PartialInfo {
  text: string;
}

Then users can filter the result themselves using this information.

For example, if using this way:

> res = parse(`[0, {"a": [{`, { allow: ALL, inject: true }) // [0, {"a": [{}]}]
> res[0][partial] // undefined
> res[1][partial] // { text: '{"a": [{' }
> res[1].a[partial] // { text: '[{' }
> res[1].a[0][partial] // { text: '{' }

They can drop at any depth level as they wish.