solid / solid-spec

Solid specification draft 0.7.0
Creative Commons Zero v1.0 Universal
1.13k stars 103 forks source link

Clarify and detail the definition of globbing to only apply to folders #148

Open RubenVerborgh opened 5 years ago

RubenVerborgh commented 5 years ago

There is a suspicion that globbing is currently defined too loosely, and that this loose behavior is not depended upon (https://github.com/solid/solid-spec/issues/147).

This PR restricts globbing to concatenating RDF files in a single folder.

acoburn commented 5 years ago

Another consideration w/r/t globbing is the issue of blank nodes. Consider the case of two documents in a container:

/container/resource1.ttl
<> a ldp:RDFSource ;
      skos:prefLabel "resource 1" ;
      prov:wasGeneratedBy [
           a prov:Activity, as:Create ;
           prov:wasAssocatedWith <https://example.com/my-webid> ;
           prov:atTime             "2019-03-27T19:23:00.512Z"^^xsd:dateTime ] .

and also:

/container/resource2.ttl
<> a ldp:RDFSource ;
      skos:prefLabel "resource 2" ;
      prov:wasGeneratedBy [
           a prov:Activity, as:Create ;
           prov:wasAssocatedWith <https://example.com/another-webid> ;
           prov:atTime             "2019-03-27T20:13:00.000Z"^^xsd:dateTime ] .

If the underlying RDF serialization library converts each resource's blank node to _:b0 (anything, really, but the issue is for any blank node label conflicts), then the resulting graph would be semantically different than requesting each resource individually.

I am no fan of blank nodes in the context of linked data, but if this is specified behavior, this general case would need to be considered.

As an alternative to globbing, I wonder whether LDPath would satisfy the existing use cases.

RubenVerborgh commented 5 years ago

Not just editorial, we exclude filename globs. Can you recheck? Agreed with rest.

Blank nodes are not an issue (will be parsed to different nodes). Note that text says union of datasets (not concat of files).

kjetilk commented 5 years ago

The formal specification for merging RDF graphs are here: https://www.w3.org/TR/rdf-mt/#graphdefs I think that should be the basis of this feature, if it is implemented. E.g. INSERT DATA is specified in terms of RDF Merge.

acoburn commented 5 years ago

The Web Annotation specification makes use of Prefer headers, specifically oa:PreferContainedDescriptions and a resulting ActivityStream-based container to fetch an entire container of resources. This interaction fits naturally within the existing LDP specification without defining a new semantic for /*.

michielbdejong commented 5 years ago

Not just editorial, we exclude filename globs.

Yes sorry, I agree, and I also applaud that simplifying change.

My only issue then with this PR is that it talks about 'concatenating', not 'merging' Turtle docs. I think we should prescribe merging instead of concatenating, and refer to the document @kjetilk mentioned.

Just to make sure I understand blank nodes correctly, should we assume that the graphs have no blank nodes in common? For instance, when merging two graphs that both state that Alice drank a glass of wine, both using a blank node, then we have no ground to assume that these two docs describe the same event, right? So the result should be that we just keep both blank nodes (not mark them as identical), and so we in the resulting merge we say Alice drank two glasses of wine, right?

RubenVerborgh commented 5 years ago

@michielbdejong

I would change the example, adding file3.ttl to which the user has no access.

Also maybe we can specify that if the user has no read access to the container, then the request would fail, but if there are no Turtle documents to which the have read access, it would return an empty RDF doc.

What would the server do if one of the documents has a syntax error?

Agree with all of the above; however, a note here that the spec currently is not at this level of detail at all. It should be, don't get me wrong, but I did not omit any details that were previously specified.

Will do it for this section, but a more rigorous version of this spec is needed indeed.

Also, how would the prefixes be done?

Up to the implementation.

Is it just a text-concatenation, or should it understand Turtle?

From the current version:

Then a request to /data/* will return the union of the datasets in file1.ttl and file2.ttl

My only issue then with this PR is that it talks about 'concatenating', not 'merging' Turtle docs.

Not really as per the above, but will add spec reference as per @kjetilk.

RubenVerborgh commented 5 years ago

@acoburn Interesting, created https://github.com/solid/solid-spec/issues/149 to track that idea.

RubenVerborgh commented 5 years ago

Applied the suggested changes. This part of the spec is now very precise, giving clear expectations for a server (and also for a client library that would assume this functionality). Contrasts a bit with the rest of the spec, but good to start somewhere.

@michielbdejong Could you re-review?

@melvincarvalho Given that you are the main stakeholder for globbing, could you please review?

melvincarvalho commented 5 years ago

I'd take a look at gold as node solid server was based on gold for globbing. The write up in the spec should be considered a really good try based on someone writing up a documentation having not created that spec. So while the write up might not be perfect, to an extent gold should lead the text.

What I recall is

I think globbing should be used sparingly, but is a great tool to take solid apps to the new level.

It's anticipated that new techniques will emerge over time -- things like http2 and websockets, proxies, caching, databases, and out of band streaming were expected to emerge over time and therefore the need for globbing would grow less and less. These techniques did not emerge yet, because we've focused resource on gettng a stable node solid server.

So tl;dr let's support what work in gold which the /* pattern. And take the triples. If you mix bnodes and globbing you're asking for trouble. Globbing is very useful but other web servers such as apache dont use it, so it's slightly bespoke. We should avoid feature creep, and keep it basic but useful.

melvincarvalho commented 5 years ago

Also, how would the prefixes be done? What if they clash?

Prefixes are just a per file short hand to save you typing out characters over and over.

They are designed not to clash.

For example if you expand the prefixes, you get consistent URIs which are also UUIDs, if that makes sense.

melvincarvalho commented 5 years ago

@michielbdejong as background consider a chat app that may have 50 friends, with 1000s of messages. What do you do at startup time?

It's a reasonable, if not good design decision to put each chat item in its own file.

Now at startup you'll need to get 1000 files. Just watch stuff crash until you queue or throttle it. Then think about async type JS frameworks which will update a UI when a new file is received. You can get incredible lag if you are pulling in a lot of files (think about thats also a problem with http2).

Globbing makes some of these problems go away. And make more than the simplest apps more feasible. Trust me you are hitting so many walls with solid at that stage, that things like globbing a god send. The current generation of app builders have not hit these walls yet, but I look forward to people doing so, appreciating how useful this feature can be to get you to prototype (where otherwise people would just give up) as app data grows.

I think as new techniques become available we'll only then be able to evaluate apps which have 100s of files or even 1000s. Also remember globbing is not just about http, you can get files from a cache, from indexeddb, from a file system and sometimes these operations are much slower one a time etc.

RubenVerborgh commented 5 years ago

Thanks @melvincarvalho!

So tl;dr let's support what work in gold which the /* pattern.

To the best of my knowledge, this is what the current proposal contains. Do you see any issues/inconsistencies? Any reason not to merge?

If you mix bnodes and globbing you're asking for trouble.

First, I disagree (no trouble in the way I currently specified it, which is a main reason for specifying). Second and more importantly: what do you propose alternatively? That we a) skip files with blank nodes b) throw an error on blank nodes c) not include triples with blank nodes d) canonicalize them e) not spec it and leave up to implementers f) … ?

melvincarvalho commented 5 years ago

@RubenVerborgh well I dont use bnodes, so dont have a problem with any merge strategy.

RubenVerborgh commented 5 years ago

Thanks, I will consider the above as no objection, unless I hear otherwise.

RubenVerborgh commented 5 years ago

Globbing makes some of these problems go away.

But creates new ones. Do we really want a single stream with an unknown number of messages, that might be very long and cause server and client pressure? As opposed to fetching resources individually (which, under HTTP/2, comes with virtually zero overhead).

This is not a problem we can solve with theoretical arguments, but which requires experimental measurements. (Take it from a researcher who happens to specialize in the partitioning of Linked Data documents to tackle that specific situation 😉 http://linkeddatafragments.org/publications/jws2016.pdf)

But let's discuss removal in https://github.com/solid/solid-spec/issues/145.

melvincarvalho commented 5 years ago

Any reason not to merge?

Why is the BasicContainer stuff in there with contains? Does gold / databox do that?

Not that I think it's a terrible idea, just dont remember seeing that before, so wonder where it came from. If we're cleaning up the spec, then gold should be the reference imho.

RubenVerborgh commented 5 years ago

Why is the BasicContainer stuff in there with contains?

Was there in the existing spec. (Which begs the question whether the spec or a reference implementation is the ground truth at the moment. If the latter, arguments such as "but it's been in the spec for x years" should be revisited.)

Does gold / databox do that?

Can you test if you have an account?

melvincarvalho commented 5 years ago

Can you test if you have an account?

Will add this to my todo list, but dont have bandwidth in the next 2 weeks.

I'd suggest following up with Dmitri who wrote the doc, with Andrei who designed globbing, and Tim who oversaw the process.

I've been dragged into this issue and thrown a ton more time at it that I intended, and have urgent other things to look at, aside from being sick. Im trying to be generous with my time here to help out, please dont pile more things on for me to do. Im not paid to do this, I am paid to do other things, helping costs me income and more.

Please just have a look at gold and the comments made, if you'd like to update the spec, and consult with others. Cheers!

RubenVerborgh commented 5 years ago

@melvincarvalho Get well soon. No one is asking you to contribute more time than you have (and a reminder that I am also a volunteer). However, you are the main—and, to my knowledge, only—person in favor of globbing, so all of this work is currently done to accommodate you. So help us help you.

I have reached out to @timbl indeed; @dmitrizagidulin has replied that he has never used it (https://github.com/solid/solid-spec/issues/147#issuecomment-477146270).

melvincarvalho commented 5 years ago

all of this work is currently done to accommodate you

@RubenVerborgh seems slightly inverted. I didnt commission this work, and have asked to wait till I have a bit more time to review. I myself am trying to be accommodating.

As I was referenced I tried to made time to reply because the tone gave me the impression that this may be something urgent. It's entirely likely I got the wrong impression, as on reflection this is a spec change. And specs tend to change over weeks and months, rather than hours and days. Cleaning up spec bugs can be faster tho if we can achieve high confidence.

If there is a desire to write a server then what we can do is implement everything except globbing, then add globbing at the end. It will likely take years to do that anyway. At that point then there would be a good need to clean or clarify globbing a bit more.

If you are highly motivated clean up the text of the globbing spec, then the most diligent way is to speak to those that created that text, which is not me. Try Dimtri, Andrei, Tim, or the community groups (which have archives). May take a bit of time to do fully, but then specs should take a while to change.

I hope I've given a bunch of "common sense" pointers that were requested (code references, answers to specific questions, background, history, motivation, specific extraneous additions such as the BasicCointainer and contains, use cases, pain points, partial review, server reference implementations) in which you might be glean with dmitri some bug fixes. e.g. to remove some of the text that seem not to have been intended such as double stars /*/* and foo*. But, check to see that if the data browser uses globbing too, I would say.

I still think of gold as reference, and it's a good exercise for anyone interested in server work to run gold. Gold was the first, solid server, and node was designed to work like gold, it looks like there's even a TestGlob function.

Get well soon

Thanks, will try!

dmitrizagidulin commented 5 years ago

@melvincarvalho I second what @RubenVerborgh said, though. You are the only current advocate of globbing, so this whole discussion is just being done to accommodate you...

RubenVerborgh commented 5 years ago

@melvincarvalho This PR was created because you pointed out that globbing as you want and use it, is different from what is in the spec: https://github.com/solid/solid/issues/253#issuecomment-476838647 Hence, your opinion in this seems crucial.

You can see @dmitrizagidulin's opinion above; haven't heard yet from @deui, but in a discussion with @timbl, he expressed his preference for removing globbing altogether (and he is not using it).

Nonetheless, I think it is good to accept this PR, so we have a documented version of exactly how globbing was supposed to work.

Implementing the removal of globbing (https://github.com/solid/solid-spec/issues/145) could consist of first moving it to a "deprecated" section, which also benefits from a clear definition of what exactly globbing is.

melvincarvalho commented 5 years ago

of first moving it to a "deprecated" sectio

You have my formal objection to this. My comment is "not now".

Tim can weigh in if he'd like to prioritize this. Very much against.

melvincarvalho commented 5 years ago

in a discussion with @timbl, he expressed his preference for removing globbing altogether

Citation required. I'd like to see the context of this. And also the time line.

elf-pavlik commented 5 years ago

I would like to see feedback and from Tim and Andrei before a substantive spec change.

I haven't seen activities around solid repos from @deiu for quite some time. If you wan't his feedback you might need to make effort to reach out to him via some more direct channel.

melvincarvalho commented 5 years ago

@elf-pavlik he may be busy. Let's give him a bit of time to respond, should he wish to. Or possibly Tim could weigh in.

The main concern is regarding the timing. Once the timing is clarified, I could get behind it.

melvincarvalho commented 5 years ago

@RubenVerborgh perhaps the easiest thing would be, could you state your preferred timeline, or preferred set of time lines for changes to the spec.

I think that may alleviate potential misunderstandings. I'm feeling quite pressured to take time out from other urgent matters, which is not an optimal way to make specification changes.

RubenVerborgh commented 5 years ago

@melvincarvalho

of first moving it to a "deprecated" section

You have my formal objection to this.

I didn't even formally propose it, so no need to object. My point is that even if we decide to start the removal process, it would be good to document the current behavior.

There are currently four described behaviors, and at least two of them are incompatible:

  1. the spec
  2. @melvincarvalho's interpretation of what the spec should be (https://github.com/solid/solid/issues/253#issuecomment-476838647)
  3. the GOLD implementation
  4. the NSS implementation

The goal of this PR was to ensure that 2. is reflected in 1.

in a discussion with @timbl, he expressed his preference for removing globbing altogether

Citation required.

That's it right there. No need to doubt my word.

I'd like to see the context of this. And also the time line.

It's a private conversation that I hence cannot share. Timeline: today, pointing to these issues and asking for input.

Not now. The bar for spec changes is extremely high, and by default should be unanimous.

For all we know, this is a spec correction.

A bug fix that aligns the spec with gold test indicated I think would be fine.

That's exactly what this PR does. The TestGlob you pointed to tests only one star at the end of a container path: https://github.com/linkeddata/gold/blob/b000d003f9e2aa40e4977839ca063f09435f80c8/server_test.go#L1193

I will aim to give a more specific review to this work. But not this week. More likely in the month of April.

That's fine, but we might not necessarily wait for that.

All I'm doing here is aligning the spec with behavior that you want, GOLD has, and NSS has. Instead of all the tangential discussions, a simple “thank you” could also have worked.

@RubenVerborgh perhaps the easiest thing would be, could you state your preferred timeline

  1. We fix the incorrect description of globbing (this PR).
  2. We implement client-side globbing.
  3. [optional] We deprecate globbing.
  4. We remove globbing.

However, given that we only have one stakeholder and two apps that use globbing, I (personally) am inclined to drop 3, possibly 2, possibly 1.

Timeline: ASAP. If there were more stakeholders, my answer would be different.

melvincarvalho commented 5 years ago

Timeline: today, pointing to these issues and asking for input.

Solid lives through its specifications.

Contentious changes to the spec cannot be carried out unilaterally on such a short time line. Globbing came about iteratively through many years of work on solid, and produced critical apps in its evolution. It's an important part of solid.

Your proposal remains problematic, not least because there are some factual errors that could be expanded upon, and I will take some time go to through the reasons for that. As such, I've readded the "on-hold" tag. It would be good not to have a "tag war", which sort of defeats the purpose of the on-hold principle too! :)

We also could use some time for Tim or Andrei to weigh in if they choose to.

I appreciate the review from @michielbdejong however I cant help but reflect that we had almost this exact same conversation c. 6 years ago in unhost on the unhosted spec. When basti and I wanted to add HEAD to the GET part of the REST spec of remote storage, and michiel in this case wanted a freeze due to "slippery slope". Somewhat amusing that we are now in reverse roles, but I do fondly recall our discussion, and hope that you can empathize with that meta principle :)

Instead of all the tangential discussions, a simple “thank you” could also have worked.

@RubenVerborgh I do thank you for this and other contributions to the topic. I simply ask that contentious spec changes have a longer time line on which to discuss. Spec changes should have a high bar for changes. Hope this is making at least some sense.

RubenVerborgh commented 5 years ago

s such, I've readded the "on-hold" tag. It would be good not to have a "tag war", which sort of defeats the purpose of the on-hold principle too! :)

on-hold is for technical blockers; I have been made aware of none. Could you please remove? (or link to another issue that blocks this one?)

Solid lives through its specifications.

Or the GOLD implementation apparently. There is a bug in the spec, this fixes it.

It would be very different if globbing were actually used by more than one stakeholder. Given that we have identified all stakeholders, and that this issue correctly captures globbing as defined by them, I don't understand the resistance.

RubenVerborgh commented 5 years ago

Discussed out of band with @melvincarvalho; this should be on hold until he and @timbl can discuss.

RubenVerborgh commented 5 years ago

Resolved conflict; mergeable again.

kjetilk commented 5 years ago

OK by me, and my assumption is that it is also OK by @timbl , but perhaps we should have a go from him?