solid / solid-spec

Solid specification draft 0.7.0
Creative Commons Zero v1.0 Universal
1.13k stars 103 forks source link

Remove globbing #151

Open RubenVerborgh opened 5 years ago

RubenVerborgh commented 5 years ago

This PR implements the (pending!) proposal to remove globbing at https://github.com/solid/solid-spec/issues/145

It bypasses clarifying the definition of globbing (https://github.com/solid/solid-spec/pull/148) by just removing it altogether, given that currently seems to be what the majority wants.

Just putting it out here as a possible option, no rush.

RubenVerborgh commented 5 years ago

Client-side globbing alternative implemented in https://github.com/solid/ldp-glob; live demo at https://solid.github.io/ldp-glob/demo.html?https://drive.verborgh.org/public/

michielbdejong commented 5 years ago

This would be for version 0.8 of the spec then. We need to discuss the timeline for that. I agree with Ruben about removal of globbing in the next spec version, but I agree with Melvin about moving slowly and not breaking things every few weeks. About timeline, my gut says let's do a next spec version 0.8 in December, and not rock the boat before that. But let's discuss that in the next weekly meeting!

RubenVerborgh commented 5 years ago

Discussed out of band with @melvincarvalho; this should be on hold until he and @timbl can discuss.

@michielbdejong Yes, but we should avoid that people start implementing globbing if it is going to be removed, so a note or label in the spec would be useful. And of course https://github.com/solid/solid-spec/pull/148 which aligns the spec with the actual situation.

linonetwo commented 5 years ago

I'm using globbing to retrieve hundreds to thousands of metafiles https://github.com/linonetwo/solid-tiddlywiki-syncadaptor/issues/4#issuecomment-491118236

I can't afford to do this client side, because there will be hundreds to thousands of wiki pages in that container, so there will be a huge amount of client-side fetch running concurrently.

elf-pavlik commented 5 years ago

@linonetwo have you run benchmarks comparing globing approach to https://github.com/solid/ldp-glob with HTTP/2 enabled?

linonetwo commented 5 years ago

@elf-pavlik Do I need to start solid-server as a library, and use spdy to enable HTTP2 in my server?

elf-pavlik commented 5 years ago

I think you could also just run it behind nginx and enable HTTP/2 in your nginx config

RubenVerborgh commented 5 years ago

I'm using globbing to retrieve hundreds to thousands of metafiles linonetwo/solid-tiddlywiki-syncadaptor#4 (comment)

Thanks for sharing this use case, it's good to know what's out there.

May I ask for a bit more detail here?

What you seem to be using is .meta.*; however, this is a kind of pattern that is not supported across Solid servers (see #147). The kind of globbing that is currently in use, is only /*, so all files in a directory. How does this affect your use case? (For instance, could you put your files in a meta subfolder?)

Another question I have is about the necessity of this design choice: could you give us some insights into the motivation for splitting data across this many files? (There might very well exist a more generic motivation, so eager to learn about it.)

A concern I do have is that, even for the server, thousands of files would turn this into a very expensive request, which ties into my DDOS worries regarding globbing (#145).

I can't afford to do this client side, because there will be hundreds to thousands of wiki pages in that container, so there will be a huge amount of client-side fetch running concurrently.

Point taken—except for "concurrently"; the browser will take tare of this, and with HTTP/2 there should only be a very limited overhead. Emphasis on should, because the per-request cost of NSS is currently too high, so it will be significantly slower with the magnitude of files you are naming.

That said, whatever design decision we make, having thousands of files in a single folder is bound to cause trouble one way or another. Not just for Solid, but for *nix or Windows systems too. So I believe the information architecture here can likely be more optimal. But please feel free to further expand on your use case, so we understand where the scale comes from.

linonetwo commented 5 years ago

Well, I've reconsidered it:

  1. I won't use xxx.meta to store "metadata (like tags) generated by the user and my application" anymore, because https://github.com/solid/solid-spec/issues/168 can't GET and DELETE xxx.meta
  2. I will use SPARQL to update and read a single index.metafile.ttl instead, and create all files using Link <http://www.w3.org/ns/ldp#Resource>; rel="type", <index.metafile.ttl>; rel="describedby".

I'm not sure if ./meta/index.ttl or ./index.metafile.ttl are good name https://github.com/linonetwo/solid-tiddlywiki-syncadaptor/issues/4#issuecomment-491519312.

The reason I choose to use globbing was "it's the easier way to get my POC app working, and the document is simple and certain", but actually I can use SPARQL instead, while I'm not pretty sure if it will work.

I draw a picture while I was thought about this, it may better describe the motivation. I'm creating a saver plugin for TiddlyWiki, which is a semantic wiki:

whyglobbing