neo4j-labs / neosemantics

Graph+Semantics: Import/Export RDF from Neo4j. SHACL Validation, Model mapping and more.... If you like it, please ★ ⇧
https://neo4j.com/labs/neosemantics/
Apache License 2.0
820 stars 143 forks source link

parsing poorly designed namespaces #79

Closed mypetfish closed 5 years ago

mypetfish commented 5 years ago

I can import ttl files from the Refinitiv Knowledge Graph (available https://permid.org/download), however attributes attached to Person.ttl become badly formed within the neo4j data model.

e.g. namespaces ns3family-name, ns3given-name do not parse within neo4j

after loading https://permid.org/sfiles/bulkDownload/OpenPermID-bulk-person-20190512_074319.ttl - WARNING! almost 1Gb file

MATCH (n:ns2Person) RETURN n.ns3family-name LIMIT 10 Neo.ClientError.Statement.SyntaxError: Variable name not defined (line 1, column 44 (offset: 43))

This is because is the only allowed non-alphanumeric/special character, so neo only parses family and throws error on the - . As this is a generic error for any content using special characters, could special characters either be replaced by or removed during conversion?

jbarrasa commented 5 years ago

Hi Alex, you have two options here:

Option 1 : Use backticks to wrap model names with special characters MATCH (n:ns2__Person) RETURN n.`ns3__family-name` LIMIT 10 and your queries will just work.

Option 2 : You can define a model mapping and get neosemantics to use it to create more neo4j-friendly names. This mapping-on-import feature is only available in 3.5 and I'm working on documenting it, so bear with me. In the meantime, you can look at the unit tests for details and here's an example of how it goes:

Say you want to load this RDF fragment (excerpt from permid person dataset):

<https://permid.org/1-34419230351> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/vcard/ns#Person> . 
<https://permid.org/1-34419230351> <http://www.w3.org/2006/vcard/ns#given-name> "Keith"^^<http://www.w3.org/2001/XMLSchema#string> .
<https://permid.org/1-34419198943> <http://www.w3.org/2006/vcard/ns#family-name> "Peltz"^^<http://www.w3.org/2001/XMLSchema#string> .
<https://permid.org/1-34419198943> <http://www.w3.org/2006/vcard/ns#given-name> "Maxwell"^^<http://www.w3.org/2001/XMLSchema#string> .
<https://permid.org/1-34419198943> <http://www.w3.org/2006/vcard/ns#additional-name> "S"^^<http://www.w3.org/2001/XMLSchema#string> .
<https://permid.org/1-34419198943> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/vcard/ns#Person> .
<https://permid.org/1-34418273443> <http://www.w3.org/2006/vcard/ns#family-name> "Benner"^^<http://www.w3.org/2001/XMLSchema#string> .
<https://permid.org/1-34418273443> <http://www.w3.org/2006/vcard/ns#given-name> "Thomas"^^<http://www.w3.org/2001/XMLSchema#string> .
<https://permid.org/1-34418273443> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/vcard/ns#Person> .
<https://permid.org/1-34418273443> <http://www.w3.org/2006/vcard/ns#friend-of> <https://permid.org/1-34419230351> .

and you want RDF properties like http://www.w3.org/2006/vcard/ns#family-name to be mapped to a more friendly name like familyName.

You define a set of mappings for a vocabulary as follows:

WITH [{ neoSchemaElem : "givenName", publicSchemaElem:  "given-name" },
{ neoSchemaElem : "familyName", publicSchemaElem: "family-name" },
{ neoSchemaElem : "additionalName", publicSchemaElem: "additional-name" },
{ neoSchemaElem : "FRIEND_OF", publicSchemaElem: "friend-of" }] AS mappings
CALL semantics.mapping.addSchema("http://www.w3.org/2006/vcard/ns#","vcard") YIELD node AS sch
UNWIND mappings as m
CALL semantics.mapping.addMappingToSchema(sch,m.neoSchemaElem,m.publicSchemaElem) YIELD node 
RETURN count(node) AS mappingsDefined

notice how you're defining a vocabulary with addSchema and then adding individual mappings for elements in the vocabulary with addMappingToSchema. If you have multiple vocabularies to map, just repeat the process for each of them.

You can list the currently defined mappings by running:

CALL semantics.mapping.listMappings()

which produces:

╒════════════════╤═════════════════╤══════════════════════════════════╤══════════════╕
│"elemName"      │"schemaElement"  │"schemaNs"                        │"schemaPrefix"│
╞════════════════╪═════════════════╪══════════════════════════════════╪══════════════╡
│"givenName"     │"given-name"     │"http://www.w3.org/2006/vcard/ns#"│"vcard"       │
├────────────────┼─────────────────┼──────────────────────────────────┼──────────────┤
│"familyName"    │"family-name"    │"http://www.w3.org/2006/vcard/ns#"│"vcard"       │
├────────────────┼─────────────────┼──────────────────────────────────┼──────────────┤
│"additionalName"│"additional-name"│"http://www.w3.org/2006/vcard/ns#"│"vcard"       │
├────────────────┼─────────────────┼──────────────────────────────────┼──────────────┤
│"FRIEND_OF"     │"friend-of"      │"http://www.w3.org/2006/vcard/ns#"│"vcard"       │
└────────────────┴─────────────────┴──────────────────────────────────┴──────────────┘

Important to note that all non-mapped vocabulary elements will be 'adapted' with a neo4j/cypher friendly name not including the namespace information. This can be a problem if you want to regenerate the imported RDF as ignoring namespaces can potentially create ambiguities if your RDF dataset has vocabulary elements with the same local name in different namespaces.

Once your mappings are defined, you can run your import with the additional config param handleVocabUris: 'MAP' as follows.

call semantics.importRDF("file:///Users/jesusbarrasa/Downloads/OpenPermID-bulk-person-20180107_070346.ntriples","N-Triples", { handleVocabUris: 'MAP' })

After data load, you can query with a much more friendly cypher:

MATCH (n:Person) RETURN n.uri, n.familyName LIMIT 10

and get:

╒══════════════════════════════════╤══════════════╕
│"n.uri"                           │"n.familyName"│
╞══════════════════════════════════╪══════════════╡
│"https://permid.org/1-34419230351"│null          │
├──────────────────────────────────┼──────────────┤
│"https://permid.org/1-34418273443"│"Benner"      │
├──────────────────────────────────┼──────────────┤
│"https://permid.org/1-34419198943"│"Peltz"       │
└──────────────────────────────────┴──────────────┘

Hope this helps at least until the documentation is published. Note that this feature (like others I'm documenting now) are brand new so would love to hear your feedback on them.

Details on defining mappings have been in the manual/readme since 3.4. Also, here's an example of using mappings for exporting RDF from your Neo4j DB.

JB.

jbarrasa commented 5 years ago

I'll take silence as an ok. I hope the examples clarified the behaviour.