walidazizi / rdflib

Automatically exported from code.google.com/p/rdflib
Other
0 stars 0 forks source link

Namespaces beginning with _ are invalid #152

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The NamespaceManager will invent namespace prefixes when its
compute_qname method is called. The prefixes it invents start with
an underscore. The legality of this might be debated but there are
several parsers, amongst them redland, that cannot read N3/turtle
generated by rdflib because of this unless inordinate care is taken
that all possible prefixes be defined beforehand.

The fix below copies the behaviour of Virtuoso which invents prefixes
starting with "ns" and a number.

Patch below

---

diff --git a/rdflib/namespace.py b/rdflib/namespace.py
index 935f1a1..127a854 100644
--- a/rdflib/namespace.py
+++ b/rdflib/namespace.py
@@ -191,7 +191,12 @@ class NamespaceManager(object):
             namespace = URIRef(namespace)
             prefix = self.store.prefix(namespace)
             if prefix is None:
-                prefix = "_%s" % len(list(self.store.namespaces()))
+                num = 1
+                while 1:
+                    prefix = "ns%s" % num
+                    if not self.store.namespace(prefix):
+                        break
+                    num += 1
                 self.bind(prefix, namespace)
             self.__cache[uri] = (prefix, namespace, name)
         return self.__cache[uri]
diff --git a/test/test_namespace.py b/test/test_namespace.py
new file mode 100644
index 0000000..4aba036
--- /dev/null
+++ b/test/test_namespace.py
@@ -0,0 +1,17 @@
+from rdflib.graph import Graph
+from rdflib.term import URIRef
+
+def test_qname():
+    """Test sequential assignment of unknown prefixes"""
+    g = Graph()
+    assert g.compute_qname(URIRef("http://foo/bar/baz")) == \
+        ("ns1", URIRef("http://foo/bar/"), "baz")
+
+    assert g.compute_qname(URIRef("http://foo/bar#baz")) == \
+        ("ns2", URIRef("http://foo/bar#"), "baz")
+    
+    # should skip this because we already assign it
+    g.bind("ns3", URIRef("http://exampe.org/"))
+
+    assert g.compute_qname(URIRef("http://blip/blop")) == \
+        ("ns4", URIRef("http://blip/"), "blop")

Original issue reported on code.google.com by wwai...@gmail.com on 26 Nov 2010 at 10:53

GoogleCodeExporter commented 8 years ago
Can you attach the output of `svn diff` to this ticket please? It will make 
applying the patch a lot easier. 

Original comment by ed.summers on 27 Nov 2010 at 2:30

GoogleCodeExporter commented 8 years ago
As an attachment... It should apply with 'patch -p1 < qname_prefix.diff'

Original comment by wwai...@gmail.com on 27 Nov 2010 at 11:00

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by ed.summers on 28 Nov 2010 at 10:48

GoogleCodeExporter commented 8 years ago

Original comment by ed.summers on 28 Nov 2010 at 10:48

GoogleCodeExporter commented 8 years ago
I guess we can chalk this up to a difference between Turtle and N3. Turtle is a 
subset of N3. So what is valid N3 isn't necessarily valid Turtle. In this case 
N3 allows prefixes to start with a underscore, and Turtle does not. And redland 
understands Turtle, whereas rdflib understands N3 :-)

That being said I think it makes sense to encourage a bit more interoperability 
between redland and rdflib by making the change you suggest:

<pre>
>>> g = rdflib.Graph()
>>> g.add((rdflib.URIRef('http://example.com/foo'), 
rdflib.URIRef('http://example.com/bar'), rdflib.Literal(1)))
>>> print g.serialize(format='n3')
@prefix _3: <http://example.com/> .

<http://example.com/foo> _3:bar 1 
</pre>

will become:

<pre>
>>> g = rdflib.Graph()
>>> g.add((rdflib.URIRef('http://example.com/foo'), 
rdflib.URIRef('http://example.com/bar'), rdflib.Literal(1)))
>>> g.serialize(format='n3')
'@prefix ns1: <http://example.com/> .\n@prefix ns2: 
<http://www.w3.org/2001/XMLSchema#> .\n\nns1:foo ns1:bar 1 .\n\n'
>>> print g.serialize(format='n3')
@prefix ns1: <http://example.com/> .
@prefix ns2: <http://www.w3.org/2001/XMLSchema#> .

ns1:foo ns1:bar 1 .
</pre>

It's interesting that the subject and object URIs becomes prefixed after this 
change. I wonder if that could potentially end up blowing up the number of 
prefixes used when serializing as N3?

Original comment by ed.summers on 28 Nov 2010 at 11:48

GoogleCodeExporter commented 8 years ago
I'm not sure this "blowing up" of prefixes demonstrated in test.py (attached) 
is good for large graphs:

before patch:

% ./test.py

<http://example.com/person1#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person2#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person3#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person4#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person5#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person6#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person7#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person8#i> a <http://xmlns.com/foaf/0.1/Person> .

<http://example.com/person9#i> a <http://xmlns.com/foaf/0.1/Person> .

after patch:

% ./test.py 
@prefix ns1: <http://example.com/person5#> .
@prefix ns10: <http://example.com/person9#> .
@prefix ns2: <http://xmlns.com/foaf/0.1/> .
@prefix ns3: <http://example.com/person4#> .
@prefix ns4: <http://example.com/person3#> .
@prefix ns5: <http://example.com/person6#> .
@prefix ns6: <http://example.com/person1#> .
@prefix ns7: <http://example.com/person7#> .
@prefix ns8: <http://example.com/person2#> .
@prefix ns9: <http://example.com/person8#> .

ns6:i a ns2:Person .

ns8:i a ns2:Person .

ns4:i a ns2:Person .

ns3:i a ns2:Person .

ns1:i a ns2:Person .

ns5:i a ns2:Person .

ns7:i a ns2:Person .

ns9:i a ns2:Person .

ns10:i a ns2:Person .

Original comment by ed.summers on 28 Nov 2010 at 12:19

Attachments:

GoogleCodeExporter commented 8 years ago
attached patch modifies the turtle serialiser to prevent prefix explosion...

Original comment by wwai...@gmail.com on 28 Nov 2010 at 1:26

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I'm just realizing now that the Turtle serializer emits the prefix with a 
leading underscore...so this really is a defect.

Original comment by ed.summers on 28 Nov 2010 at 2:37

GoogleCodeExporter commented 8 years ago
This issue was closed by revision r1899.

Original comment by ed.summers on 28 Nov 2010 at 3:47

GoogleCodeExporter commented 8 years ago
This issue was closed by revision dcea99e96edc.

Original comment by ed.summers on 30 Mar 2011 at 9:07