linkeddata / rdflib.js

Linked Data API for JavaScript
http://linkeddata.github.io/rdflib.js/doc/
Other
566 stars 146 forks source link

RDF/XML Literals not being parsed properly #75

Closed jamsden closed 6 months ago

jamsden commented 9 years ago

Given some RDF/XML that contains:

<oslc:serviceProvider>
    <oslc:ServiceProvider rdf:about="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml">
      <dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>
      <oslc:details rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w"/>
      <jfs_proc:supportLinkDiscoveryViaLinkIndexProvider rdf:parseType="Literal">false</jfs_proc:supportLinkDiscoveryViaLinkIndexProvider>
      <jfs_proc:supportContributionsToLinkIndexProvider rdf:parseType="Literal">true</jfs_proc:supportContributionsToLinkIndexProvider>
      <jfs_proc:globalConfigurationAware rdf:parseType="Literal">compatible</jfs_proc:globalConfigurationAware>
      <jfs_proc:consumerRegistry rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w/links"/>
    </oslc:ServiceProvider>
  </oslc:serviceProvider>

An a query such as: someKb.the(aServiceProvider, DCTERMS('title’));

returns:

<dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>    

instead of the text. Am I missing something of is the dcterms:title being parsed incorrectly?

jamsden commented 9 years ago

In the this.parseDOM() function, changing:

                        var nv = parsetype.nodeValue;
                        if (nv === "Literal"){
                            frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom)
                               // should work but doesn't
                            frame = this.buildFrame(frame);
                            frame.addLiteral(dom);
                            dig = false;
                        }

to:

                        var nv = parsetype.nodeValue;
                        if (nv === "Literal"){
                            frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom)
                               // should work but doesn't
                            frame = this.buildFrame(frame);
                            frame.addLiteral(dom.lastChild.nodeValue);
                            dig = false;
                        }

to get the actual content of the literal node seems to work. Will this might break something else?

jamsden commented 9 years ago

I didn't mean to close the issue.

jamsden commented 9 years ago

It appears the dataType is incorrect:

  { subject: 
     { uri: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml',
       value: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml' },
    predicate: 
     { uri: 'http://purl.org/dc/terms/title',
       value: 'http://purl.org/dc/terms/title' },
    object: 
     { value: 'JKE Banking (Change Management)',
       lang: '',
       datatype: [Object] },
    why: 
     { uri: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/workitems/catalog',
       value: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/workitems/catalog' } },

Should it be:

{ value: 'JKE Banking (Change Management)',
  lang: undefined,
  datatype: undefined }

or somehow a string? Or am I doing this query incorrectly:

    var sp = this.catalog.statementsMatching(undefined, DCTERMS('title'), 'JKE Banking (Change Management)');

Does the string literal object need to be wrapped in this.catalog.literal? I tried that too, still didn't match, and I noticed that wrapping the string as a literal leaves the datatype undefined as shown above.

jamsden commented 9 years ago

I'm making some progress. The ‘addLiteral’ function of the RDFParser frameFactory adds the datatype sym('http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral') for literal nodes while the kb.literal('JKE Banking (Change Management') uses undefined - so they never match. If I force the data type to XMLLiteral, then the match works:

var sp = this.catalog.statementsMatching(undefined, DCTERMS('title'), this.catalog.literal('JKE Banking (Change Management)', undefined, this.catalog.sym('http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral'))));

This doesn't seem to match the documentation which says you should be able to just use a JavaScript string. Is this a bug or does it work as intended, and I have to create these literals with the symbol datatype?

timbl commented 9 years ago

The parsetype="Literal" syntax in RDF/XML is for quoting pieces of embed XML literally. I think you probably just want strings. If you just miss out parsetype="Literal" then you will have the strings you want I suspect.

jamsden commented 9 years ago

Unfortunately I don't control the RDF/XML source, its from Rational Team Concert OSLC Service Provider Catalog. So I may have to just deal with RTC's quirk for how it expresses dcterms:title. That's no problem.

However, isn't there still an issue? The RDF/XML source is:

<oslc:serviceProvider>
    <oslc:ServiceProvider rdf:about="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml">
      <dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>
      <oslc:details rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w"/>
      <jfs_proc:supportLinkDiscoveryViaLinkIndexProvider rdf:parseType="Literal">false</jfs_proc:supportLinkDiscoveryViaLinkIndexProvider>
      <jfs_proc:supportContributionsToLinkIndexProvider rdf:parseType="Literal">true</jfs_proc:supportContributionsToLinkIndexProvider>
      <jfs_proc:globalConfigurationAware rdf:parseType="Literal">compatible</jfs_proc:globalConfigurationAware>
      <jfs_proc:consumerRegistry rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w/links"/>
    </oslc:ServiceProvider>
  </oslc:serviceProvider>

Seems like the value of this property should be LiteralXML, but shouldn't include the property itself, just the value:

JKE Banking (Change Management)  

(is this even valid XML?) not

<dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title>  
jamsden commented 9 years ago

I think my patch above is incorrect. The this.parseDOM() function for Literal nodes:

                        var nv = parsetype.nodeValue;
                        if (nv === "Literal"){
                            frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom)
                               // should work but doesn't
                            frame = this.buildFrame(frame);
                            frame.addLiteral(dom);
                            dig = false;
                        }

should normalize the children of the Literal property (so that === on embedded XML works consistently regardless of ordering), and use an XML serializer to create the value of the node which should be XML source, not parsed DOM. I see similar code in the RDFa parser. If this is correct, I can submit a fix.

lonniev commented 8 years ago

Interesting, I have a problem here in May 2016 with Jim's oslc-client being unable to find Service Providers because the statementsMatching method is not finding XMLLiterals that contain the sought CCM Project Name (name only). I wonder if rdflib.js evolved while Jim's OSLC4JS example has not.

jamsden commented 8 years ago

My patch for XMLLiterals has not been merged into rdflib.js yet.

On May 30, 2016, at 1:14 AM, Lonnie VanZandt notifications@github.com wrote:

Interesting, I have a problem here in May 2016 with Jim's oslc-client being unable to find Service Providers because the statementsMatching method is not finding XMLLiterals that contain the sought CCM Project Name (name only). I wonder if rdflib.js evolved while Jim's OSLC4JS example has not.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/linkeddata/rdflib.js/issues/75#issuecomment-222412872, or mute the thread https://github.com/notifications/unsubscribe/ABECqgXHfqIHNFxueNv3laRPyknDOaEwks5qGnI3gaJpZM4FpbtE.

lonniev commented 8 years ago

Because the find-the-service-provider-by-name method is only looking for a string in what is likely to be a fairly small set of titles, we could refactor the method to retrieve all ?-title-? statements and then use a simple JS or Lodash collection filter to pick out the pattern "(.)${serviceProviderTitle}(.)". That may be good enough versus trying to get the rdflib.catalog to recognize our particular literal value string. What do you think?

lonniev commented 8 years ago

The following and the addition of lodash and escapeStringRegex allow the method to find the statement that relates the subject uri to the literal title for the sought serviceProviderTitle.

       var haveTitle = this.catalog.statementsMatching(
            undefined, 
            DCTERMS('title'),
            undefined );

        const regex = new RegExp( ".*?" + escapeStringRegexp( serviceProviderTitle ) + ".*?" );

        var sp = _.filter( haveTitle,
            (s) =>
            {                
                return s.object.value.match( regex );
            }
        );
akoptelov commented 6 years ago

@jamsden probably even easier fix without introducing new dependency: frame.addLiteral(dom.childNodes)

jamsden commented 6 years ago

frame.addLiteral(dom.childNodes) does indeed work.

DOM such as:

JKE Banking (Change Management)

Another paragraph

And another paragraph

would parse as the following literal string of XML source:

JKE Banking (Change Management)

Another paragraph

And another paragraph

So this becomes a one-line code change. I'll implement in my fork, test and create a PULL request. There is about to be a lot of use of rdflib.js in developing OSLC integrations. This defect is a show stopper however since OSLC makes a lot of use of parseType="Literal".
JeffCave commented 6 years ago

This change does not behave nicely in-browser.

The Browser's DomParser handles serialization of NodeLists differently than the library used for NodeJS. In the browser, objects get serialized as "[object NameOfDataType]", rather than the contents of the list.

I would propose that the line

frame.addLiteral(dom.childNodes)

Would be better as

//frame.addLiteral(dom.innerHTML);
frame.addLiteral(dom.innerHTML || dom.childNodes);

This both serializes the inner content, as well as preserving it's XML content as requried by parseType='Literal'. By checking innerHTML first we use that by default, otherwise assume we are in node and serialize with default childNodes handler.

I'm a little fuzzy on how nodejs handles this. I assume xmldom does not have an innerHTML property.


Issue verified in:

https://forum.solidproject.org/t/errors-parsing-xml-with-rdflib-js-in-the-browser/448

AndreyBespamyatnov commented 9 months ago

We are facing the same issue. Is it possible to get that fixed or do you have any workarounds? Thanks

bourgeoa commented 9 months ago

@AndreyBespamyatnov

//frame.addLiteral(dom.innerHTML); frame.addLiteral(dom.innerHTML || dom.childNodes);

Is this solving your issue ? Or are there other issues ? I published an rdflib@2.2.34-1 on npm with this patch ? Is this working for you ? Can you test it ?

AndreyBespamyatnov commented 9 months ago

@AndreyBespamyatnov

//frame.addLiteral(dom.innerHTML); frame.addLiteral(dom.innerHTML || dom.childNodes);

Is this solving your issue ? Or are there other issues ? I published an rdflib@2.2.34-1 on npm with this patch ? Is this working for you ? Can you test it ?

Hi @bourgeoa, let my try a new version and if not I will come back with more information about the issue and some test data, Thank you

paulslauenwhite commented 9 months ago

@bourgeoa, we had the same issue as this bug in an implementation of the OSLC AM V3 specification using rdflib@2.2.31 and moving to rdflib@2.2.34-1 resolved the issue with no side effects. Thanks for the fix.

paulslauenwhite commented 7 months ago

@bourgeoa, this fix is not in rdflib@2.2.34-beta or rdflib@2.2.34. When will the next rdflib release containing this fix be published to https://www.npmjs.com/package/rdflib?

bourgeoa commented 6 months ago

@paulslauenwhite

@bourgeoa, this fix is not in rdflib@2.2.34-beta or rdflib@2.2.34. When will the next rdflib release containing this fix be published to https://www.npmjs.com/package/rdflib?

merged in rdflib@2.2.35

paulslauenwhite commented 6 months ago

Thanks @bourgeoa! Confirmed rdflib@2.2.35 contains this fix. Will https://github.com/linkeddata/rdflib.js/releases be updated with the 2.2.35 release?

bourgeoa commented 6 months ago

https://github.com/linkeddata/rdflib.js/releases/tag/v2.2.35