jsonata-js / jsonata

JSONata query and transformation language - http://jsonata.org
MIT License
2.06k stars 220 forks source link

1.6.4 is polluting arrays with a 'sequence' property #296

Open meganwalker-ibm opened 5 years ago

meganwalker-ibm commented 5 years ago

PR https://github.com/jsonata-js/jsonata/pull/292 included in JSONata 1.6.4 started polluting data returned to consumers with a 'sequence' property attached onto an array.

I don't think this internal property should be externalised to consumers; especially in a way that means it can't be serialised losslessly into a string.

screenshot 2019-01-14 at 10 59 58

The sequence property here is added by JSONata, not from the data we are working with, and it cannot be serialised correctly into JSON (Nor can the equivalent JSON string be parsed into javascript)

{
  "foo": {
    "bar": [
      {
        "name": "freddy",
        "price": 6
      },
      {
        "name": "jsonata",
        "price": 1000000
      },
      sequence: true
    ]
  }
}

Represents the returned data structure, except it's not valid JSON.

andrew-coleman commented 5 years ago

How are you serializing to JSON? The following Javascript code:

const arr = [1,2,3];
arr.sequence=true;
JSON.stringify(arr);

produces "[1,2,3]"

mattbaileyuk commented 5 years ago

I believe the concern can be represented with the following: If you have

const jsonata = require('jsonata')
let input = {
  "units": [
    {
      "name": "bob",
      "price": 20
    },
    {
      "name": "jessica",
      "price": 500
    },
    {
      "name": "may",
      "price": 40
    }
  ]
}
console.log(jsonata('units[price > 20]').evaluate(input))

then the output to the console (also seen in a Javascript debugger) is

[
  {
    name: 'jessica', price: 500
  },
  {
    name: 'may', price: 40
  },
  sequence: true
]

where sequence: true wasn't present in 1.6.3 and below. This does seem to go with JSON.stringify(), but I guess the discussion is around whether evaluate() should be returning this property in the object.

I guess the wider question is, should evaluate() be returning valid JSON? I would have thought so, which suggests this should be cleaned up.

andrew-coleman commented 5 years ago

The sequence property was always there, but it used to be created using Object.defineProperty with enumerable set to false. However, this had a significant performance hit, and rewriting it this way gave a 3x performance improvement on the expression $sum(Account.Order.Product.(Price*Quantity)). If you can come up with an alternative solution that doesn't degrade performance, I'd welcome it.

mattbaileyuk commented 5 years ago

I was aware that the sequence property was there internally before as processing is taking place, and hence didn't feel concerned by the change in the PR, I didn't realise that this would have an impact on what was returned from an evaluate() call, which is an external change. Having evaluate() return a sequence property is new behaviour in 1.6.4.

mtiller commented 5 years ago

FYI, in my (TypeScript) implementation, I always "wrapped" the values in a wrapper where the sequence and a few other properties were properties of the wrapper, not of the actual value. This allowed me to include any such properties pretty easily. I would unwrap such values before calling functions, for example.

Now, that approach may have had a significant impact on performance, I have no idea. I didn't do any performance benchmarking.

edbrannin commented 5 years ago

Possibly related: I'm seeing keepSingleton: true instead of sequence: true in arrays of Strings.

EDIT: I'm noticing this because it's throwing off ava's t.deepEqual() in my unit tests.

cherukurisurendra2010 commented 4 years ago

Hi Team,

We are facing the same issue, we are getting extra sequence key.

Could you please provide any updated status on this bug.

Thanks.

andrew-coleman commented 4 years ago

The 'status' is as discussed above. Any changes in this area could have significant performance impact. Before touching this, we need a framework for measuring performance so that we can be certain of no regressions.

These properties are only added to the Array object. When you JSON.stringify() the structure, they don't appear in the output.

loganvolkers commented 4 years ago

The other option is to have a final "sanitization" step that makes sure that a clean array is returned.

function cleanOutput(raw) {
  if (Array.isArray(raw)) {
    return [...raw];
  } else {
    return raw;
  }
}

This could be done as a final step in evaluate, but it can also be done externally.

const jsonata = require("jsonata");
let input = {
  units: [
    {
      name: "bob",
      price: 20
    },
    {
      name: "jessica",
      price: 500
    },
    {
      name: "may",
      price: 40
    }
  ]
};
const output = jsonata("units[price > 20]").evaluate(input);
output.keepSingleton = "bar" // For illustration only
console.log(output, Object.keys(output));
const clean = cleanOutput(output);
console.log(clean, Object.keys(clean));

The above code outputs

["0", "1", "sequence", "keepSingleton"]
["0", "1"]

This could provide an immediate workaround for:

I'll leave the performance fix and potential for integration into core up to @andrew-coleman. If performance is a big concern, maybe we can get a jsonata->webassembly compiler written ;)

ameinhardt commented 4 years ago

@loganvolkers : that doens't work for nested arrays, does it? How about:

cleanOutput = JSON.parse(JSON.stringify(raw))
knolleary commented 4 years ago

I have been looking at this issue in related to #473.

I've got a simple performance test that runs the $sum(Account.Order.Product.(Price*Quantity)) expression 100000 times, across 10 separate runs, and then averages the time taken and number of evals/sec.

I have used that to compare some possible approaches to resolving this issue:

  1. Object.defineProperty to make the properties non-enumerable. As @andrew-coleman said, this is known to be slower, but I tried it out to get a sense of how much slower given the current code.
  2. WeakMap - this uses a WeakMap in util.js to track the 'known' sequence arrays, rather than having to flag the array itself
  3. Subclass Array - create a custom class that extends Array, so we can use instanceof to check its type, but still looks just like an Array.

    class Sequence extends Array {}

Code evals/sec
existing 40891
Object.defineProperty 29178
WeakMap 23581
subclass Array 30491

So yeah... no quick win with those options.

Also, having got this far, I realise there's more to it than just being able to identify it as a sequence - there is also the keepSingleton property used in some places that gets added to the object. So I'm going to post this comment as a record of my failure to find a performant fix so far.

andrew-coleman commented 4 years ago

Thanks Nick! :)

agarwalamit662 commented 3 years ago

Any update on this issue ..? I am getting sequence property in my arrays.

pankleks commented 3 years ago

I have a problem when I create in jsonata field called "sequence", when I test it on jsonata playground it's working fine.

But when I do it in my App (JS) the field is not created.

If I rename field to something else like "seq" all is fine.

Can it be somehow realted?

kolban-google commented 3 years ago

+1 on this issue!!!

sban2009 commented 2 years ago

+1

koladilip commented 2 years ago

We are also facing this issue, can this be fixed?

Woodz commented 2 years ago

Please consider adding a sanitisation step before returning the data from JSONATA library, because JS test libraries fail without a clear reason due to this internal field. Currently we need to pass all the responses from JSONATA through JSON.parse(JSON.stringify to workaround this issue

Woodz commented 2 years ago

The sequence property was always there, but it used to be created using Object.defineProperty with enumerable set to false. However, this had a significant performance hit, and rewriting it this way gave a 3x performance improvement on the expression $sum(Account.Order.Product.(Price*Quantity)). If you can come up with an alternative solution that doesn't degrade performance, I'd welcome it.

@andrew-coleman please reconsider prioritising performance ahead of functionality/ease of usage. I sincerely doubt that the majority of your users will have issues with the library being 4x slower, whereas these implementation details seem to be causing a lot of issues based on the number of issues and comments about them

airandfingers commented 1 month ago

+1, this issue has been causing our test assertions (specifically, Vitest/Jest's .toBeCalledWith({ key: expect.any(String) }) to fail with baffling error messages like Compared values have no visual difference, wasting hours of developer time.

We can use lodash.cloneDeep everywhere as a workaround, but we shouldn't have to - jsonata should be fixed to clean up these extraneous values, even at the cost of some microseconds for each call.