va-big-data-genomics / trellisdata

Python package with classes and methods used in implementing a Trellis data management system.
MIT License
0 stars 1 forks source link

Investigate neo4j.Record: can be used to split results? #9

Open pbilling opened 1 year ago

pbilling commented 1 year ago

Each instance of a neo4j.Record returned by neo4j.Result using a method such a neo4j.Result.fetch() or neo4j.Result.single() contains one instance of the pattern matched by the Cypher query. In contrast, the graph returned by neo4j.Result.graph() contains all the nodes and relationships that were matched.

pbilling commented 1 year ago

Roadmap

pbilling commented 1 year ago

Code snippet:

with driver.session() as session:
    result = session.run("MATCH (f1:Fastq)-[r:HAS_MATE_PAIR]->(f2:Fastq) RETURN f1, r, f2")
    records = result.fetch(2)
return(records)

Result:

[<Record f1=<Node element_id='227' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 1, 'sample': 0}> r=<Relationship element_id='132' nodes=(<Node element_id='227' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 1, 'sample': 0}>, <Node element_id='228' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 2, 'sample': 0}>) type='HAS_MATE_PAIR' properties={}> f2=<Node element_id='228' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 2, 'sample': 0}>>, <Record f1=<Node element_id='227' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 1, 'sample': 0}> r=<Relationship element_id='133' nodes=(<Node element_id='227' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 1, 'sample': 0}>, <Node element_id='229' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 2, 'sample': 0}>) type='HAS_MATE_PAIR' properties={}> f2=<Node element_id='229' labels=frozenset({'Fastq'}) properties={'readGroup': 0, 'matePair': 2, 'sample': 0}>>]
pbilling commented 1 year ago

How to translate Neo4j records to JSON in Python?

The neo4j.Result class provides a data() method which returns each record in JSON format (a list of dictionaries). Using this approach has been recommended multiple times on StackOverflow, including by a Neo4j dev.

Here is what the result of this approach looks like using the same data as above.

Code snippet:

with driver.session() as session:
    result = session.run("MATCH (f1:Fastq)-[r:HAS_MATE_PAIR]->(f2:Fastq) RETURN f1, r, f2")
    json_records = result.data()
return(json_records)

Result:

[{'f1': {'readGroup': 0, 'matePair': 1, 'sample': 0}, 'r': ({'readGroup': 0, 'matePair': 1, 'sample': 0}, 'HAS_MATE_PAIR', {'readGroup': 0, 'matePair': 2, 'sample': 0}), 'f2': {'readGroup': 0, 'matePair': 2, 'sample': 0}}, {'f1': {'readGroup': 0, 'matePair': 1, 'sample': 0}, 'r': ({'readGroup': 0, 'matePair': 1, 'sample': 0}, 'HAS_MATE_PAIR', {'readGroup': 0, 'matePair': 2, 'sample': 0}), 'f2': {'readGroup': 0, 'matePair': 2, 'sample': 0}}]

A couple issue with this approach:

  1. It doesn't explicitly indicate which entities are nodes and which are relationships, though you can infer from the structure.
  2. It does not include the labels for nodes. This is a dealbreaker because labels are essential for using Trellis database triggers and it also makes it impossible to reconstitute the original graph from the JSON data.

So, I think a better approach is to use a custom function to translate records to JSON.

pbilling commented 1 year ago

The neo4j.Record class definition: https://github.com/neo4j/neo4j-python-driver/blob/5.0/src/neo4j/_data.py#L49.

Class RecordExporter is used by the Record.data() method to transform records into JSON: https://github.com/neo4j/neo4j-python-driver/blob/5.0/src/neo4j/_data.py#L276.

How Result class initializes records: Record(zip(self._keys, record)). https://github.com/neo4j/neo4j-python-driver/blob/5.0/src/neo4j/_sync/work/result.py#L177.