Open henryliangt opened 2 years ago
Traversal and Path • The length of path is the number of relationships along the path
Cypher
uses patterns to represent core concepts in the property graph model • E.g. a pattern may represent that a user node is having a transaction with the item “formula” in it. • There are basic patterns representing nodes, relationships and path • It uses clauses to build queries; Certain clauses and keywords are inspired by SQL • A query may contain multiple clauses • Functions can be used to perform aggregation and other types of analysis
vertices, node, 顶点,节点, edge, arc, link, 边缘,弧线,链接, Undirected graphs, 无向图, Directed graphs, 有向图, Index-free Adjacency 无索引邻接 dangling, 悬空
(variable) (n) variable’s scope is restricted in a single query statement
• A pattern can include one or many labels • (a:User) • (a:User:Admin)
• Specifying properties • Properties are a list of name value pairs enclosed in a curly brackets • (a { name: "Andres", sport: "soccer"})
• Specifying properties • Properties are a list of name value pairs enclosed in a curly brackets • (a { name: "Andres", sport: "soccer"})
Cypher patterns: relationships
• Basic Relationships • (a)--(b) • Direction is not important • Matches any relationship between node a and node b • (a)-[r]->(b) • Matches any relationship from node a to node b, variable r is used to refer to the relationship
Cypher patterns: relationships • Relationship type is also indicated by a prefix colon • (a)-[r:FRIENDS]->(b) • Matches any FRIENDS relationship from node a to node b, variable r is used to refer to the relationship • (a)-[r:FRIENDS|COWORKERS]->(b) • Matches either FRIENDS or COWORKERS relationship from node a to node b, variable r is used to refer to the relationship
Relationship of variable lengths • (a)-[2]->(b) • Matches any path of length 2 from node a to node b • This is equivalent to (a)-->()-->(b) • (a)-[3..5]->(b) • Matches any path of minimum length of 3 and maximum length of 5 from node a to node b • (a)-[*3..]->(b) • Matches any path of minimum length of 3 from node a to node b
Relationship of variable lengths • (a)-[..5]->(b) • Matches any path of maximum length of 5 from node a to node b • (a)-[]->(b) • Matches path of any length from node a to node b • (a)-[r:KNOWS*1..2]->(b) • Matches any path consists of KNOWS relationships from node a to node b with a minimum length of 1 and a maximum length of 2. A variable r is used to refer to this path
• Pattern: (n) • Matches all nodes in the graph • Pattern: (m:Movie) • Matches the movie node in the graph • Pattern: (p:{name: ’Tom Hanks’}) • Matches the person node with name ‘Tom Hanks’ in the graph • Pattern: (p1)-[r:DIRECTED]->(m1) • Matches the path from person Robert Zemeckis to movie “Forrest Gump”
Pattern: (p1{name:‘Filipa’})<-[r:KNOWS*1..2]-() n Matches the path from Dilshad to Filipa (length 1) The path from Anders to Filipa (length 2) https://neo4j.com/docs/cypher-manual/4.1/syntax/patterns/
• Create a node matrix1 with the label Movie CREATE (matrix1:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'}) • Create a node keanu with the label Actor CREATE (keanu:Actor {name:'Keanu Reeves', born:1964}) • Create a relationship ACTS_IN CREATE (keanu)-[:ACTS_IN {roles:'Neo'}]->(matrix1)
The identifier “Keanu” and “matrix1” are used in the this create clause. We did not give the relationship a name/identifier. We need to write the three clauses in a single query statement to be able to use those variables
MATCH (movie:Movie) RETURN movie
MATCH (a:Actor)-[:ACTS_IN]->(:Movie{title:"The Matrix"}) RETURN a.name
MATCH (movie:Movie) RETURN movie
MATCH (a {name: 'A’}) RETURN a.age
MATCH (a {name: 'A’}) RETURN a.age as age
MATCH (n:Actor) SET n.age = 2022 - n.born RETURN n
MATCH (n:Actor) REMOVE n.age RETURN n
MATCH (n:Actor{name:"Keanu Reeves"}) REMOVE n:Actor RETURN n
It is OK to set some properties and remove other properties in the same statements.
• Delete relationship MATCH (n{name:"Keanu Reeves"})-[r:ACTS_IN]->() DELETE r • Delete a node and all possible relationship MATCH (m{title:'The Matrix'})-[r]-() DELETE m,r
• Pattern matching can express equality condition on properties • E.g. (m{title:'The Matrix'}) • Means title = 'The Matrix' • The WHERE clause can add other conditions to the patterns in MATCH clause • The syntax is very similar to SQL where clause
MATCH (n:Person) WHERE n.age < 30 RETURN n.name, n.age
MATCH (n:Person)-[k:KNOWS]->(f) WHERE k.since < 2000 RETURN f.name, f.age, f.email
MATCH (n:Person)-[k:KNOWS]->(f) WHERE n.age > 30 AND k.since < 2000 RETURN n.name AS name, f.name as friend
BOOLEAN operators: OR, XOR, NOT
CONTAINS, STARTS WITH, ENDS WITH
MATCH (n:Person) WHERE NOT n.name ENDS WITH 'y' RETURN n.name, n.age
For the three names: “Andy”, “Peter” and “ Timothy”, in the graph, the query would match the name “Peter”
Regular Expression: =~ 'regexp' MATCH (n:Person) WHERE n.email =~ '.*\.com' RETURN n.name, n.age, n.email
MATCH (n:Person) WHERE n.belt IS NOT NULL RETURN n.name, n.belt
• “Patterns are expressions in Cypher, expressions that return a list of paths. List expressions are also predicates — an empty list represents false, and a non-empty represents true.” • So patterns can be part of the WHERE clause • BUT there is difference in patterns used in MATCH clause and in WHERE clause • MATCH checks specified patterns in the entire graph and assign variables to those matched part • WHERE only checks if pattern exists in sub graph returned by MATCH
Filter on patterns example Find persons that do not have an outgoing relationship with Peter
MATCH (person:Person), (peter:Person {name: 'Peter’}) WHERE NOT (person)-->(peter) RETURN person.name, person.age
Not an efficient way to implement the query, We will see a different implementation next week.
Existential subquery
• ORDER BY clause can order the results by single or combination properties in specified direction MATCH (n) RETURN n.name, n.age ORDER BY n.name MATCH (n) RETURN n.name, n.age ORDER BY n.name DESC
• Both SKIP and LIMIT need an integer to specify the number of records to skip or to return MATCH (n) RETURN n.name, n.age ORDER BY n.name SKIP 1
MATCH (n) RETURN n.name, n.age ORDER BY n.name LIMIT 5
MATCH (n) RETURN n.name, n.age ORDER BY n.name SKIP 1 LIMIT 5
MATCH (n:Actor) RETURN n.name AS name UNION ALL MATCH (n:Movie) RETURN n.title AS name Return “Hitchcock” twice
Return “Hitchcock” once MATCH (n:Actor) RETURN n.name AS name UNION MATCH (n:Movie) RETURN n.title AS name
OLAP – OnLine Analysis Processing of graph data OLTP – OnLine Transaction Processing possible
Google Pregel, Apache Giraph
RDF (Resource Description Framework) Model Express node-edge relation as “subject, predicate, object” triple (RDF statement) SPARQL query language
Why these 2 are important? ?? Native graph storage using property graph model ?
?? Index-free Adjacency ?
sharded graph mechanism since 4.0 ?? Neo4j Fabric ? ?
Nodes contain properties • Properties are stored in the form of key-value pairs • A node can have labels (classes)
Relationships connect nodes and can have properties as well • It always has a type • It can have properties • Has a direction, a source node and a target node • But traversal can happen in either direction • No dangling relationships (can’t delete node with a relationship) The source and the target node can be the same one
Properties
• A property is a pair of property key and property value • The property value can be of simple type: • Number: Integer and Float • String • Boolean • Spatial Type: Point • Temporal Type • The property value can also have homogeneous list of simple types as type • e.g. a list of integers or strings • It cannot have heterogeneous list or other complex types with many levels of embedding