trungdong / prov

A Python library for W3C Provenance Data Model (PROV)
http://prov.readthedocs.io/
MIT License
121 stars 44 forks source link

Prefixes (Namespaces) for alphanumeric identifiers not passed to turtle serialisation #96

Open cmaumet opened 7 years ago

cmaumet commented 7 years ago

Hi @satra, @trungdong,

I am trying to use Namespace to specify prefixes for our alphanumeric identifiers and though this works fine with provn, it can't get it to work with the turtle serialisation. Any help would be greatly appreciated!

Here is a minimal example:

from prov.model import ProvDocument, Namespace, QualifiedName
import urllib.request
import csv

# Get a list of preferred prefixes form online csv file
csv_url = "https://raw.githubusercontent.com/incf-nidash/nidm/master/nidm/nidm-results/terms/prefixes.csv"
prefix_file = urllib.request.urlopen(
    csv_url).read().decode('utf-8').splitlines()
prefixes = dict()
reader = csv.reader(prefix_file)
for alphanum_id, prefix, uri in reader:
    prefixes[uri] = Namespace(prefix, uri)['']

## Example of prefixes dictionnary
# prefixes = dict(
#     [("http://purl.org/nidash/nidm#NIDM_0000170",
#       Namespace('nidm_groupName',
#                 'http://purl.org/nidash/nidm#NIDM_0000170')['']),
#      ("http://purl.org/nidash/nidm#NIDM_0000165",
#       Namespace('nidm_NIDMResultsExporter',
#                 'http://purl.org/nidash/nidm#NIDM_0000165')[''])],
#     )

g = ProvDocument()

group_name_uri = "http://purl.org/nidash/nidm#NIDM_0000170"
ex = Namespace('ex', 'http://example/')

g.entity(ex['group1'], {prefixes[group_name_uri]: "Group 1"})

print(g.serialize(format='provn'))
print("---")
print(g.serialize(format='rdf', rdf_format="turtle"))

and the output:

document
  prefix ex <http://example/>
  prefix nidm_groupName <http://purl.org/nidash/nidm#NIDM_0000170>

  entity(ex:group1, [nidm_groupName:="Group 1"])
endDocument

---
@prefix ex: <http://example/> .
@prefix nidm_groupName: <http://purl.org/nidash/nidm#NIDM_0000170> .
@prefix ns1: <http://purl.org/nidash/nidm#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:group1 a prov:Entity ;
    ns1:NIDM_0000170 "Group 1"^^xsd:string .

I would like ns1:NIDM_0000170 to be replaced by nidm_groupName: in the turtle document.

satra commented 7 years ago

PR submitted: https://github.com/RDFLib/rdflib/pull/660

satra commented 7 years ago

@cmaumet - i tested with this.

from prov.model import ProvDocument, Namespace, QualifiedName
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/incf-nidash/nidm/master/nidm/nidm-results/terms/prefixes.csv')
df['Namespace'] = df[['Preferred prefix', 'URI']].apply(lambda x: Namespace(x['Preferred prefix'], x['URI']), axis=1)
prefixes = dict(df[['URI', 'Namespace']].values)
g = ProvDocument()

group_name_uri = "http://purl.org/nidash/nidm#NIDM_0000170"

for prefix in prefixes:
    if prefix == group_name_uri:
        g.add_namespace(prefixes[prefix])

group_name_uri = "http://purl.org/nidash/nidm#NIDM_0000170"
ex = Namespace('ex', 'http://example/')
g.entity(ex['group1'], {group_name_uri: "Group 1"})

print(g.serialize(format='provn'))
print("---")
print(g.serialize(format='rdf', rdf_format="turtle"))
cmaumet commented 7 years ago

Thank you @satra!

satra commented 7 years ago

this has been superceded by this PR, which takes performance into account: https://github.com/RDFLib/rdflib/pull/649