juanjoDiaz / streamparser-json

Streaming JSON parser in Javascript for Node.js and the browser
MIT License
133 stars 11 forks source link

Replace nested objects and arrays #22

Closed kasual1 closed 1 year ago

kasual1 commented 1 year ago

@juanjoDiaz ,

I know this is probably out of scope of this library, but do you think it is possible to adjust the code in order to omit nested objects and arrays.

I have a large json object that looks like this:

{
   "cards": [
     {
       "id": 1,
       "name": "Some card name"
      },
     {},
     {}
    ],
   "meta": {
      "updated":"2022-12-31"
    }
}

The cards array is very large, so that it won't fit into memory on its own. (Even when parsing in chunks) I'd like to get all objects as flat objects that replace nested arrays with "[...]" and nested objects with "{...}". The result would look like this:

{
  "cards":"[...]",
  "meta":"{...}"
},
{
 "id": 1,
 "name": "Some card name"
},
{},
{},
{
  "updated": "2022-12-31"
}

I'm aware that this is probably out of scope of this repo, but I would like to apply the changes in my fork. Can you point me into a direction where to look at or where those changes would fit best?

Best regards and thanks a lot for the awesome parser :-)

juanjoDiaz commented 1 year ago

Hi @kasual1 ,

I don'0t fully understand what you are trying to do. But I think that it is supported already.

Just take a look at the following options:

{
  paths: <string[]>,
  keepStack: <boolean>, // whether to keep all the properties in the stack
}

paths: Array of paths to emit. Defaults to undefined which emits everything. The paths are intended to suppot jsonpath although at the time being it only supports the root object selector ($) and subproperties selectors including wildcards ($.a, $., $.a.b, , $..b, etc). keepStack: Whether to keep full objects on the stack even if they won't be emitted. Defaults to true. When set to false the it does preserve properties in the parent object some ancestor will be emitted. This means that the parent object passed to the onValue function will be empty, which doesn't reflect the truth, but it's more memory-efficient.

The paths option allows you to select which sub-objects will be emitted. The keepStack option can be set to false so objects are not kept in memory after emitting. So, if you use { paths: ['$.cards.*'], keepStack: false } only the objects within the cards array will be emitted and the card array will not be built in memory.. To emit all nested items 1 level deep you could use paths: ['$.*.*']. etc...

juanjoDiaz commented 1 year ago

Closing since there was no response. Feel free to reopen if you feel that there is still something to respond.