walidazizi / rdflib

Automatically exported from code.google.com/p/rdflib
Other
0 stars 0 forks source link

serialize(format="pretty-xml") fails on cyclic links #180

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
When cyclically related nodes are serialized with format="pretty-xml", the 
nodeID for the outermost node is not included.  This leaves the inner node 
making a dangling (unresolved) reference.  See below for details...

--------------
What steps will reproduce the problem?

  $ [Fresh Ubuntu 11.04 install]
  $ sudo easy_install -U "rdflib>=3.0.0"

  $ python

from rdflib import *
g = Graph()
g.bind("j", URIRef("http://example.com#"))

g.add((a, RDF.type, URIRef("http://example.com#a")))
g.add((b, RDF.type, URIRef("http://example.com#b")))

g.add((a, URIRef("htt://example.com#linkedto"), b))
g.add((b, URIRef("htt://example.com#linkedto"), a))

print g.serialize(format="pretty-xml")

--------------
What is the expected output? 

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:ex="http://example.com#"
>
  <ex:b rdf:nodeID="GmmQoLHT3">
    <ex:linkedto>
      <ex:a>
        <ex:linkedto rdf:nodeID="GmmQoLHT3"/>
      </ex:a>
    </ex:linkedto>
  </ex:b>
</rdf:RDF>

--------------
What do you see instead?

The node of type <ex:b> does *not* have a nodeID assigned, meaning that the 
reference is broken!  Exact output below:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:ex="http://example.com#"
>
  <ex:b>
    <ex:linkedto>
      <ex:a>
        <ex:linkedto rdf:nodeID="$GmmQoLHT3"/>
      </ex:a>
    </ex:linkedto>
  </ex:b>
</rdf:RDF>

--------------
What version of the product are you using? On what operating system?

  $ [Fresh Ubuntu 11.04 install]
  $ sudo easy_install -U "rdflib>=3.0.0"

Original issue reported on code.google.com by jman...@gmail.com on 13 Aug 2011 at 12:28

GoogleCodeExporter commented 8 years ago
Good catch. The PrettyXMLSerializer code is explicit about this (mis)behaviour:

if isinstance(subject, BNode):
    def subj_as_obj_more_than(ceil):
        return more_than(store.triples((None, None, subject)), ceil)
    #here we only include BNode labels if they are referenced
    #more than once (this reduces the use of redundant BNode identifiers)
    if subj_as_obj_more_than(1):
        writer.attribute(RDF.nodeID, fix(subject))

and this produces exactly the disconnect that you are seeing.

Just to confirm: this issue is restricted to the output of PrettyXMLSerializer 
- choose any serialization other than pretty-xml and you will be able to parse 
it back into a properly-connected graph.

IMO, it's a false economy and could usefully be dispensed with. 

Original comment by gjhigg...@gmail.com on 24 Oct 2011 at 5:55

GoogleCodeExporter commented 8 years ago
Fixed in changeset 3ddb26ab8f4f - check for BNode references > 1 commented out.

Original comment by gjhigg...@gmail.com on 26 Oct 2011 at 2:34