jexp / neo4j-shell-tools

A bunch of import/export tools for the neo4j-shell
289 stars 55 forks source link

graphml export of array values #44

Open gravitythread opened 10 years ago

gravitythread commented 10 years ago

Very nice tool. I've come across a small problem with the export of nodes with array data attached to them. In the graphml ouput that array value will look something like '[Ljava.lang.String;@2fbefd8d'

Thanks

davidhbigelow commented 10 years ago

I discovered the same problem.... Please fix it if you have time... Very important thing...

davidhbigelow commented 10 years ago

NOTE - you can use an unwind operation to create separate records for each array element to get it to export... but re-importing that to something else needs to fiddling....

match (n:item) unwind n.myarray as arrayItem return n.name, n.arrayItem

gravitythread commented 10 years ago

Cool, I will give that a try.

davidhbigelow commented 10 years ago

I fixed this by doing this following:

Export using an unwind -- import-cypher -d";" -o data.txt match (n:part) unwind n.csys as csys return n.item as item, csys

import using the following: LOAD CSV WITH HEADERS FROM 'file:/data.txt' as csvLine FIELDTERMINATOR ';' match (node:part) // find existing matches for the update of array data where node.name = csvLine.item foreach (n in (case when has(node.csys) then [1] else [] end) | set node.csys = node.csys+','+csvLine.csys) // concat the values during import foreach (n in (case when not(has(node.csys)) then [1] else [] end) | set node.csys = csvLine.csys) // create if not exists

clean up using the following: match (n:part) // find the parts where has(n.csys) // that has the property with n as n, (case when length(split(n.csys,',')) > 1 then split(n.csys,',') // break the string else n.csys end) as newCsys set n.csys = newCsys; // reapply the value to the source

--==TOTAL HACK==-- BUT GOT IT THERE...

yourpalal commented 8 years ago

Not totally complete, but I have a fork of this repo that adds support for importing & exporting array properties. https://github.com/yourpalal/neo4j-shell-tools/releases/tag/array-support

jexp commented 8 years ago

@yourpalal would you be able to send a PR ?

yourpalal commented 8 years ago

@jexp I could, but it's probably not quite ready for that. There are three reasons you might not want it yet:

  1. it's not optimized at all, so it will probably cause some slowdown, although my testing has shown it is still pretty fast.
  2. It embeds json arrays, which is maybe not quite right for graphml. It works great for my purposes.
  3. Currently only handles string arrays because that was my immediate need, but it would be easy to expand to the other supported types

I guess at this point it's more like a (working) proof of concept. That said, let me know what you think and I may be able to get it up to snuff some time soon.

jexp commented 8 years ago

Ah ok, yes it is really unfortunate that graphml doesn't cater for array values. Not sure how they are encoded in XML in other places.

yourpalal commented 8 years ago

In other xml documents it would generally be something like this:

    <data key="p0">
        <element>first</element>
        <element>second</element>
    </data>

The advantage of embedding JSON arrays over this is that when taken as text by another graphml parser, you get something that is still at least sort of sensible, whereas changing the XML structure would probably break another parser. That said I haven't actually scoured the GraphML spec to see if array properties might be directly supported somehow, and if they are, it would probably look something like the example above.

yourpalal commented 8 years ago

Looks like GraphML is expected to be extended with xml. Search for "4.2 Adding Complex Types" on http://graphml.graphdrawing.org/primer/graphml-primer.html#EXT to see an example with svg.

The result would probably look like this:

<data key="p0"><array>
    <element>first</element>
    <element>second</element>
</array></data>

As a bonus, this would remove the dependency on a json library, as this would be parsed directly by the xml parser.

jexp commented 8 years ago

Hmm can you try how other graphml readers treat this? This could be a nice option to encode arrays sensibly. Not sure if you can also specify the array type (string, int, boolean) in the graphml type section.

yourpalal commented 8 years ago

In my current implementation, the array type is specified via the elements. string_array for an array of strings, int_array for an integer array, etc..

I can probably look in to other implementations some time this week.

yourpalal commented 8 years ago

I looked at a variety of other tools: Gephi, Mathematica, Jgrapht, orientdb, yEd, boost graph library, and maybe more, and none of them seem to support array properties. So I guess this is a chance to be a trailblazer... Once people start trying to import the resulting files into other tools they might start adding support.

I think probably the best way to do this is to extend the xml schema as mentioned in the graphml docs to allow for documents like this:

declaring an array property:

<key id="p0" type="string" multiplicity="many" for="node" />

declaring a non-array property (two equivalent example)

<key id="p1" type="string" for="node" />
<key id="p1" type="string multiplicity="single"  for="node" />

supplying array data

<data key="p1"><array>
   <value>isn't</value>
   <value>xml</value>
   <value>fun?</value>
</array></data>

Once there is an xml spec it will have to be imported in the exported graphml files & hosted somewhere if these files are to be valid xml data. Maybe they can be hosted from neo4j.com?

equaeghe commented 8 years ago

@yourpalal Is there anthing special I should do to have export-graphml export string arrays as json? I installed your fork (for the 2.3 series), but I still get the old behavior.

yourpalal commented 8 years ago

@equaeghe no, it should just work. I mainly use it for importing graphml, but so there could be a bug in the export, but it worked when I originally tested it. Are your sure that neo4j is loading the right jar?

equaeghe commented 8 years ago

@yourpalal You wrote

Are your sure that neo4j is loading the right jar?

How can I test this?

jexp commented 7 years ago

With neo4j-shell being deprecated, I currently focus the effort on the apoc library which can be used both from shell and browser. APOC also has export-graphml functionality that contains the lastest fixes.

Please have a look in the APOC docs