neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
https://neo4j.com/labs/apoc
Apache License 2.0
1.7k stars 495 forks source link

Exported JSON Fails JSON Validation Check #1634

Closed typinator closed 3 years ago

typinator commented 4 years ago

Expected Behavior (Mandatory)

Simple test model created with two nodes and two relationships. two_nodes_two_relationships

JSON file exported validates with no error. Checked against https://jsonchecker.com/ after error shown in local JSON viewing/editing app (JSON Editor - Mac)

Actual Behavior (Mandatory)

JSON Editor reports - 'Start of object not expected after outer-most array or object'

https://jsonchecker.com reports

Parse error on line 1: ...ric_attribute":42}}{"type":"node","id": ----------------------^ Expecting 'EOF', '}', ',', ']', got '{'

How to Reproduce the Problem

Simple Dataset (where it's possibile)

Exported JSON Dataset: {"type":"node","id":"0","labels":["Class_1"],"properties":{"author":"Fred Bloggs","name":"element1_no_lists","numeric_attribute":42}} {"type":"node","id":"1","labels":["Class_2"],"properties":{"name":"element2_no_lists","date modified":"2020-08-23","numeric_attribute":23}} {"id":"0","type":"relationship","label":"HAS_SOME_RELATIONSHIP_WITH","start":{"id":"0","labels":["Class_1"]},"end":{"id":"1","labels":["Class_2"]}} {"id":"1","type":"relationship","label":"RELATIONSHIP_FROM_N2_TO_N1","start":{"id":"1","labels":["Class_2"]},"end":{"id":"0","labels":["Class_1"]}}

//Insert here a set of Cypher statements that helps us to reproduce the problem

CYPHER dump

CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;  
UNWIND [{_id:0, properties:{author:"Fred Bloggs", name:"element1_no_lists", numeric_attribute:42}}] AS row  
MERGE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Class_1;  
UNWIND [{_id:1, properties:{name:"element2_no_lists", `date modified`:"2020-08-23", numeric_attribute:23}}] AS row  
MERGE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Class_2;  
UNWIND [{start: {_id:0}, end: {_id:1}, properties:{}}] AS row  
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})  
MATCH (end:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.end._id})  
MERGE (start)-[r:HAS_SOME_RELATIONSHIP_WITH]->(end) SET r += row.properties;  
UNWIND [{start: {_id:1}, end: {_id:0}, properties:{}}] AS row  
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})  
MATCH (end:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.end._id})  
MERGE (start)-[r:RELATIONSHIP_FROM_N2_TO_N1]->(end) SET r += row.properties;  
MATCH (n:`UNIQUE IMPORT LABEL`)  WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;  
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;

exported using

//Export Model as JSON
CALL apoc.export.json.all("Export_JSON_test_"+toString(date.transaction())+"_two_nodes_two_relationships"+".json",{useTypes:true})
YIELD file,format,source,nodes,relationships,properties,time
RETURN file,format,source,nodes,relationships,properties,time;

Steps (Mandatory)

  1. Create model / load from CYPHER statements.
  2. Export as JSON using apoc.json.all (above)
  3. Load exported JSON file into JSON validator

Screenshots (where it's possibile)

JSON Editor validation error:

json_editor_validation_error

jsconchecker validation error:

jsonchecker_validation_error

Specifications (Mandatory)

Currently used versions

Versions

jexp commented 4 years ago

this is mean to be a JSONLINE (JLINE) format where each object is on it's own line to allow for streaming storage and consumption, so that not a single gigantic JSON list needs to be materialized in memory.

jexp commented 4 years ago

but there should be newlines between the objects that are created, do those show @typinator ?

typinator commented 4 years ago

@jexp - the apoc-exported file: Export_JSON_test_2020-08-25_two_nodes_two_relationships.json.txt

jexp commented 4 years ago

Looks good to me at least, one object per line.

http://jsonlines.org/

typinator commented 4 years ago

I cannot see a JSON validator on jsonlines.org. Have you run this against any JSON validators?

The exported .json fails validation in many tools including:-

oxygenXMLDeveloperValidationError FirefoxValidationError

which then causes problems working on these exported files.

jexp commented 4 years ago

@conker84 I think we could add an optional config option: format: "jsonarray|jsonobject|jsonlines" which defaults to jsonlines.

json-object could either have nodes/links|rels as top-level and then id as keys or it could be like in d3 an array for each.

jexp commented 4 years ago

@typinator each line is a valid json document here, so if you want to validate them you can either validate them individually or replace all line endings "}\n" with "},\n" and insert a '[before and]` after to get a json array.

typinator commented 4 years ago

Is there a tool that validates with JSON lines?

The problem is most tools work at file level so that if transforming or operating on the file - which is the model after all - they throw an error and this then prevents the transformation proceding. If there was a dtd-like url against which something with JSON lines can be validated this would help. As far as I can see there's nothing in the JSON spec that deals with a json file containing multiple objects like this - it seems to be a defacto standard which doesn't help when it comes to transferring models between tools.

typinator commented 4 years ago

o if you want to validate them you can either validate them individually or replace all line endings "}\n" with "},\n" and insert a '[before and]` after to get a json array.

Having a 'JSON' file that doesn't validate without further processing isn't good. It would make more sense for the export procedure to be able to create a single valid JSON file that represents the entire model. This would then also validate - hence the option of JSON or JSONline seems the way to go. Similarly the JSON import procedure needs to reflect this as well.

conker84 commented 3 years ago

I'm closing this as we manage the issue in #1640