Property Graph Exchange Format (PG) converters
This package implements parsers and serializers to convert between labeled property graph formats and databases.
A property graph (also known as labeled property graph) is an abstract data structure consisting of nodes and (possibly directed) edges between these nodes. Nodes and edges can each have labels and properties. Property graph formats and databases slightly differ in their data model by restrictions, support of data types etc.
This package implements the Property Graph Exchange Format (PG), aimed to be a superset of common models, with parsers and serializers from and to various formats.
Default installation requires node >= 18.0.0 (or node >= 20.0.0 for development).
npm install -g pgraphs
To connect to Neo4J databases, also install:
npm install -g neo4j-driver-lite
Browser bundles have not been created yet.
Alternatively install or update a Docker image (see Docker usage below):
docker pull ghcr.io/pg-format/pgraphs
Command pgraph
is installed with this package:
Usage: pgraph [options] [<source> [<target>]]
Convert between property graph formats and databases.
Options:
-f, --from [format] source format
-t, --to [format] target format
-i, --id [key] copy node id to property
--merge merge nodes/edges in CYPHER
--html generate HTML label
-s, --scale [factor] scale spatial properties x,y,width,height,pos
-e, --errors verbose error messages
-q, --quiet don't warn when graph is reduced
-h, --help show usage information
-V, --version show the version number
Supported conversion formats:
pg from/to PG format (default source format)
json from/to PG-JSON
jsonl from/to PG-JSONL (default target format)
cypher from/to Cypher CREATE statements
cypherl to CYPHERL (one query per line, requires id property)
neo4j from/to Neo4J database (via Cypher query)
dot from/to GraphViz DOT
tgf from/to Trivial Graph Format
canvas from/to JSON Canvas (experimental)
graphology from/to Graphology import/export
ncol from/to NCOL file format
xml to GraphML
yarspg to YARS-PG
csv to OpenCypher/Neo4J CSV files
neptune to Neptune CSV import (aka Gremlin load data format)
mmd to Meermaid Flowchart (experimental)
gexf to Graph Exchange XML Format (GEXF)
Command pgraph
is executable when installed as Docker image this way:
docker run -i --rm ghcr.io/pg-format/pgraphs
The long command can be abbreviated for instance with an alias:
alias pgraph='docker run -i --rm ghcr.io/pg-format/pgraphs'
Note that the command cannot access file outside of the Docker image by default, so usage is limited to reading/writing from/to standard input/output:
# this won't work
docker run -i ghcr.io/pg-format/pgraphs graph.pg graph.jsonl
# this will
<graph.pg docker run -i ghcr.io/pg-format/pgraphs > graph.jsonl
Programming API may still change. Try this or look at the sources:
import { pgformat, ParsingError } from "pgraphs"
const graph = {
nodes: [ ... ],
edges: [ ... ]
}
try {
const pgstring = pgformat.pg.serialize(graph)
const graph = pgformat.pg.parse(pgstring)
} catch (ParsingError e) {
console.log(`Parsing failed in line ${e.line}`)
}
Many formats and conventions exist to store labeled property graphs. Each format comes with a syntax and a limited or extended data model of property graphs: not every feature can be expressed in every format! The following table lists all formats and systems known by know and whether they can be read and/or written from with this package:
read | write | format or database |
---|---|---|
yes | yes | PG format |
yes | yes | PG-JSON |
yes | yes | PG-JSONL |
yes | yes | Cypher CREATE |
yes | CYPHERL | |
yes | yes | Neo4J or compatible |
yes | yes | Trivial Graph Format (TGF) |
yes | yes | GraphViz DOT |
yes | yes | JSON Canvas |
yes | yes | Graphology |
yes | yes | NCOL |
yes | GraphML | |
yes | GEXF | |
yes | YARS-PG | |
yes | OpenCypher/Neo4J CSV | |
yes | Amazon Neptune CSV | |
yes | Mermaid |
The repository of pgraphs contains a CSV file and equivalent pg file listing these and more graph formats with their capabilities to store selected graph features.
PG format was first proposed by Hirokazu Chiba, Ryota Yamanaka, and Shota Matsumoto (2019, 2022). A revision is currently taking place to get to a final specification. See the Property Graph Exchange Format Homepage for details.
The following graph in PG format with two nodes and two edges uses features such as multiple labels, and property values, numbers and boolean values:
101 :person name:Alice name:Carol country:"United States"
102 :person :student name:Bob country:Japan
101 -- 102 :same_school :same_class since:2012
101 -> 102 :likes since:2015 engaged:false
See also:
The same graph in PG-JSON and in PG-JSONL:
{
"nodes": [{
"id": "101", "labels": [ "person" ],
"properties": { "name": [ "Alice", "Carol" ], "country": [ "United States" ] }
},{
"id": "102", "labels": [ "person", "student" ],
"properties": { "name": [ "Bob" ], "country": [ "Japan" ] }
}],
"edges": [{
"from": "101", "to": "102", "undirected": true,
"labels": [ "same_school", "same_class" ], "properties": { "since": [ 2012 ] }
},{
"from": "101", "to": "102",
"labels": [ "likes" ], "properties": { "engaged": [ false ], "since": [ 2015 ] }
}]
}
{"id":"101","labels":["person"],"properties":{"name":["Alice","Carol"],"country":["United States"]}}
{"id":"102","labels":["person","student"],"properties":{"name":["Bob"],"country":["Japan"]}}
{"from":"101","to":"102","labels":["same_school","same_class"],"properties":{"since":[2012]},"undirected":true}
{"from":"101","to":"102","labels":["likes"],"properties":{"since":[2015],"engaged":[false]}}
There is also a JSON Schema for PG-JSON. and a JSON Schema for PG-JSONL.
When exported to GraphViz DOT format, labels are ignored and edges become either all undirected or stay all directed.
graph {
101 [country="United States" name=Alice];
102 [country=Japan name=Bob];
101 -- 102 [since=2012];
101 -- 102 [since=2015];
}
Graphviz can generate image files from DOT, so pgraph can be used to create diagrams from any other graph source:
pgraph graph.pg -t dot | dot -Tsvg -o graph.svg
With option --html
the full labels and properties of nodes and edges are
converted to HTML labels, resulting in the following diagram:
When exported to GraphML, labels are ignored and all values are converted to strings:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns">
<graph edgedefault="undirected">
<node id="101">
<data key="country">United States</data>
<data key="name">Alice</data>
<data key="name">Carol</data>
</node>
<node id="102">
<data key="country">Japan</data>
<data key="name">Bob</data>
</node>
<edge source="101" target="102">
<data key="since">2012</data>
</edge>
<edge source="101" target="102">
<data key="engaged">false</data>
<data key="since">2015</data>
</edge>
</graph>
</graphml>
When exported to GEXF 1.3, labels but the first edge label and multi-edges of same label are ignored. Export of properties as GEXF attributes has not been implemented yet, so this export format is experimental.
The example graph in Cypher language with CREATE statements. The undirected edge is ignored because Cypher only supports directed edges:
CREATE (`101`:person {name:["Alice","Carol"], country:"United States"})
CREATE (`102`:person:student {name:"Bob", country:"Japan"})
CREATE (`101`)-[:likes {since:2015, engaged:false}]->(`102`)
Further differences between PG data model and Cypher include no support of
null
in property values and mixed types in repeated property values.
The CYPHERL Format is
a list of CYPHER statements, each on one line. The format requires nodes to
have a unique identifier property (see command line option --id
) for
reference across statements. Option --merge
with change statements to
update existing nodes and edges instead of creating new ones new ones.
Export to YARS-PG requires node ... 5.0.0 is limited to nodes and edges without schema, so all property values are mapped to strings:
(node1{"person"}["country":"United States","name":["Alice","Carol"]])
(node2{"person","student"}["country":"Japan","name":"Bob"])
(node1)-["same_school"]["since":"2012"]-(node2)
(node1)-["likes"]["engaged":"false","since":"2015"]-(node2)
Property graphs can be stored in form of separate CSV files for nodes and
edges, respectively. An nearly common form these files is supported by Neo4J as
CSV header format and by Amazon Neptune as OpenCypher CSV format. pgraph
creates four files in csv
format using the output
as base name (with
optional directory):
.nodes.headers
and base + .nodes.csv
with node data.edges.header
, and base + .edges.csv
with edge dataThe example graph is serialized as following, in four files:
:START_ID,:END_ID,:TYPE,since:int,engaged:boolean
101,102,same_school,2012
101,102,likes,2015,false
:ID,:LABEL,name:string[],country:string
101,person,Alice�Carol,United States
102,person;student,Bob,Japan
Repeated labels and property values are separated by a NULL-Byte (shown as � above) as array delimited to allow using arbitrary characters in these values and NULL-Bytes are removed from string values (see this Neo4J feature request). Configuration of this character to some other value is not supported yet.
Imported into a Neo4J database and exported again is serialized as following in PG. Thus conversion of property graphs between PG and Neo4J or Neptune should be round-trip apart from identifiers, undirected edges, semicolon, and support of additional data types:
1 :person country:"United States" name:Alice name:Carol
2 :person :student country:Japan name:Bob
1 -> 2 :same_school since:2012
1 -> 2 :likes engaged:false since:2015
Amazon Neptune graph database also supports import of property graph data in a
CSV format called Gremlin load data
format
(but only by Amazon, not by Apache TinkerPop community). This CSV format is
very similar to the more common CSV format but it also allows to escape
the semicolon used as array delimiter as \;
.
The example graph is serialized as following, in two files:
~id,~label,name:String[],country:String
101,person,Alice;Carol,United States
102,person;student,Bob,Japan
~id,~from,~to,~label,since:Int,engaged:Bool
0,101,102,same_school,2012
1,101,102,likes,2015,false
The Trivial Graph Format (TGF) is a text-based format to exchange labeled graphs. It does not support properties, multiple labels nor line breaks in labels. The example graph is serialized as following:
1 person
2 person
#
1 2 same_school
1 2 likes
Parsed back from TGF and serialized as PG format, this is equivalent to:
1 :person
2 :person
1 -> 2 :same_school
1 -> 2 :likes
The spatial hypertext JSON Canvas format can store
a spatial graph with nodes of text (in Markdown), links, or files. Each node
requires a position and size at least. The corresponding properties (width
,
height
, x
, y
) are not included in the example graph but GraphViz can be
used to generate them. As GraphViz uses dpi instead of pixel, the numbers
should be scaled with pgraph option --scale
. This command line pipeline
generates a JSON Canvas from the example graph:
pgraph examples/example.pg -t dot | dot | pgraph -f dot -s 4 -t canvas
To transform a DOT file graph.dot
into JSON Canvas:
dot graph.dot | pgraph -f dot -s 4 -t canvas > graph.canvas
JSON Canvas can be read as well, but not all features are supported.
Export to Mermaid is experimental and may lead to syntactically invalid Marmaid
files because there is no formal specification and because some characters
cannot be escaped. Mermaid supports HTML in names of nodes and edges (property
name
) but HTML attributes must be single-quoted (<a href='...'
instead of
<a href="https://github.com/pg-format/pgraphs/blob/main/.."
) and numeric character entities cannot be used.
By default the example graph is exported to this Mermaid diagram source:
flowchart LR
101["Alice"]
102["Bob"]
101 --- 102
101 --> 102
mermaid-cli can be used to generate image files from Mermaid diagram files or from any other graph source:
pgraph graph.pg --html -t mmd | mmdc -i - -o graph.svg
With option --html
the full labels and properties of nodes and edges are
converted to HTML labels, resulting in the following diagram:
The NCOL file format is used to visualize very large undirected graphs with Large Graph Layout software. The graph is eventually reduced to simple edges with optional weight, but extensions exist for coloring and node labels (not supported by this library).
pgraphs can directly connect to some graph databases for import and/or export.
Format neo4j
requires to install node package neo4j-driver
(done
automatically by calling npm install
unless pgraphs package is installed as
dependency of another project) and expects a JSON file with Neo4J database
Bolt-API URI and credentials as source or target. Use the following for
a default installation on your local machine:
{
"uri": "neo4j://localhost",
"user": "",
"password": ""
}
Reading from a database uses a Cypher MATCH
query. Writing into a database
uses the list of Cypher CREATE
queries as exported with Cypher target
format, so the following should be equivalent:
pgraph graph.pg query.cypher
and manually execute query query.cypher
pgraph -t neo4j pgraph.pg neo4j.json
Reading from and writing to other graph database systems supporting Cypher and Bolt protocol (Memgraph, Kuzú, FalkorDB, TuGraph...) may be possible but has not been tested so far.
For larger graphs better export in CSV format to multiple files and
bulk import the CSV files with neo4j-admin database import
and these options:
--delimiter=","
--array-delimiter="\0"
(NULL-Byte)Cypher command LOAD CSV
will not work because it expects an additional
MERGE
clause and node/edges must have uniform labels.
The pgraphs git repository contains
shell scripts in directory neo4j
to run a local Neo4J instance with Docker and to bulk import CSV files from
local directly ./import
.
Some graph software and libraries can import and export multiple formats as well:
Licensed under the MIT License.
A first version of the PG model and its serializations PG format and PG-JSON have been proposed by Hirokazu Chiba, Ryota Yamanaka, and Shota Matsumoto (2019, 2022).