linkeddata / rdflib.js

Linked Data API for JavaScript
http://linkeddata.github.io/rdflib.js/doc/
Other
561 stars 142 forks source link

301 redirect : Wrong containment triples when fetching Container from NSS without trailing slash #617

Open angelo-v opened 1 year ago

angelo-v commented 1 year ago

Steps to reproduce

  1. fetch a container from NSS without using a trailing slash in it's URI
  2. query the store for the same URI as subject and ldp:contains as predicate

Expected result

The canonical container URI is with a slash, so the query should not give a result

Actual result

A statement with a wrong relative URI as the object

Example

  1. Fetch https://bourgeoa.solidcommunity.net/chats
  2. Store contains the following triple:
<https://bourgeoa.solidcommunity.net/chats>
  <http://www.w3.org/ns/ldp#contains>
    <https://bourgeoa.solidcommunity.net/10b089b1-cec0-475f-9be3-7078d950c04c/> .

Correct containment statement would be

<https://bourgeoa.solidcommunity.net/chats/>
  <http://www.w3.org/ns/ldp#contains>
    <https://bourgeoa.solidcommunity.net/chats/10b089b1-cec0-475f-9be3-7078d950c04c/> .

Note the difference in the subject URI (trailing slash) and the object URI (relative to /chats/)

This statement is not in the store, besides following the redirect from NSS. Only when fetching directly with a trailing shlash everything works as expected.

Test to reproduce:

describe("container URI without trailing slashes", () => {
  it("should contain nothing", async () => {
    const store = graph();
    const fetcher = new Fetcher(store);
    await fetcher.load("https://bourgeoa.solidcommunity.net/chats");

    const contains = store.statementsMatching(
      sym("https://bourgeoa.solidcommunity.net/chats"),
      sym("http://www.w3.org/ns/ldp#contains")
    );
    expect(contains).toEqual([]);
  });
});
bourgeoa commented 1 year ago

When a container URL is without a /, NSS is using a 301 redirect to a new URL ending with a / The issue is with 301 return code. When using 200 the result is correct.

@timbl I am wondering if rdflib does check the header for location or only content-location ?

csarven commented 1 year ago

When a server receives a GET request to /chat, it should do the required:

It may do the optional:

TallTed commented 1 year ago

@csarven — In https://github.com/linkeddata/rdflib.js/issues/617#issuecomment-1512453461, it seems that the several chats should become chat, or the single chat should become chats...

timbl commented 1 year ago

Here are all the headers:

$ curl -I https://bourgeoa.solidcommunity.net/chats
HTTP/1.1 301 Moved Permanently
X-Powered-By: solid-server/5.7.7
Vary: Accept, Authorization, Origin
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Authorization, User, Location, Link, Vary, Last-Modified, ETag, Accept-Patch, Accept-Post, Updates-Via, Allow, WAC-Allow, Content-Length, WWW-Authenticate, MS-Author-Via, X-Powered-By
Allow: OPTIONS, HEAD, GET, PATCH, POST, PUT, DELETE
Link: <chats.acl>; rel="acl", <chats.meta>; rel="describedBy", <http://www.w3.org/ns/ldp#Resource>; rel="type"
Location: /chats/
Content-Type: text/plain; charset=utf-8
Content-Length: 41
Date: Tue, 18 Apr 2023 21:39:09 GMT
Connection: keep-alive
Keep-Alive: timeout=5

$ curl -I https://bourgeoa.solidcommunity.net/chats/
HTTP/1.1 200 OK
X-Powered-By: solid-server/5.7.7
Vary: Accept, Authorization, Origin
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Authorization, User, Location, Link, Vary, Last-Modified, ETag, Accept-Patch, Accept-Post, Updates-Via, Allow, WAC-Allow, Content-Length, WWW-Authenticate, MS-Author-Via, X-Powered-By
Allow: OPTIONS, HEAD, GET, PATCH, POST, PUT, DELETE
Link: <.acl>; rel="acl", <.meta>; rel="describedBy", <http://www.w3.org/ns/ldp#Container>; rel="type", <http://www.w3.org/ns/ldp#BasicContainer>; rel="type"
WAC-Allow: user="read",public="read"
MS-Author-Via: SPARQL
Updates-Via: wss://bourgeoa.solidcommunity.net
Content-Type: application/octet-stream; charset=utf-8
Content-Length: 2
ETag: W/"2-nOO9QiTIwXgNtWtBJezz8kv3SLc"
Date: Tue, 18 Apr 2023 21:40:15 GMT
Connection: keep-alive
Keep-Alive: timeout=5

$ 
timbl commented 1 year ago

So you are saying it looks as though rdflib uses the location of the initial resource not the one after the redirect, as a base address for parsing the data.

csarven commented 1 year ago

@TallTed Good catch. I missed the slashes in the last item. Updated.

bourgeoa commented 1 year ago

So you are saying it looks as though rdflib uses the location of the initial resource not the one after the redirect, as a base address for parsing the data.

Yes. When I looked at rdflib code I did not find any use of location in relation with 301, but only of content-location in fetcher

bourgeoa commented 1 year ago

In the Handle fetch() response content-location is checked has if it was a single location But location is nowhere checked Is the rdflib webOperation meant to do :

angelo-v commented 1 year ago

Yes, I suppose in case of a 301 redirect rdflib should use the URI of the location header as the base URI, not the URI it requested initially and got redirected.

timbl commented 1 year ago

In https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L604 maybe original is the problem. I suspect it goes back to the original URI to avoid using the URI sent to a proxy when using a proxy. So we also want to avoid using the URI from the last fetch operation.

NoelDeMartin commented 1 year ago

Well I just found out the same @timbl already said :sweat_smile:. But yes, I can confirm that's the problem. I've tried to reproduce the issue, and changing the following in https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L1161 the test passes:

.then(response => {
    if (options.baseURI !== response.url) {
        options.baseURI = response.url;
    }

    if (options.original.value !== response.url) {
        options.original.value = response.url;
    }

    return this.handleResponse(response, response.url, options);
},

Not that this should be the solution, but it proves that the issue was indeed that the redirect was happening silently.

bourgeoa commented 1 year ago

I found the same and checking for the Link Container in response of latest fetch does resolve the issue.

Replace https://github.com/linkeddata/rdflib.js/blob/main/src/fetcher.ts#L604 with

  let baseUrl = options.original.value
    const isContainer = kb.any(options.original, null, ns.ldp('Container'))
    console.log('@@ isContainer ' + isContainer)
    if (isContainer && !baseUrl.endsWith('/')) baseUrl = baseUrl + '/' 
    let p = N3Parser(kb, kb, options.original.value, baseUrl,
      null, null, '', null)
bourgeoa commented 1 year ago

@angelo-v https://github.com/linkeddata/rdflib.js/pull/621 passes your test here is the npm version rdflib@2.2.32-a6c694ca to test with PodOS

angelo-v commented 1 year ago

It fixes the issue with wrong containment triples.

But still rdflib states that the resource without a trailing slash is of type ldp:Container which i think is wrong.

bourgeoa commented 1 year ago

But still rdflib states that the resource without a trailing slash is of type ldp:Container which i think is wrong.

@angelo-v I suppose this patch will resolve your point https://github.com/linkeddata/rdflib.js/pull/621#discussion_r1186346645 With this patch a slash is also added to the resource without a trailing slash of type ldp:Container.

bourgeoa commented 1 year ago

@angelo-v here is the npm version rdflib@2.2.32-2b635b59 to test with PodOS