nodeSolidServer / node-solid-server

Solid server on top of the file-system in NodeJS
https://solidproject.org/for-developers/pod-server
Other
1.78k stars 303 forks source link

Upgrade public Solid servers software to v5 and migrate filesystems #950

Open RubenVerborgh opened 5 years ago

RubenVerborgh commented 5 years ago

Filesystem migration can happen with the tool created in #949.

RubenVerborgh commented 5 years ago

Continuing the discussion from https://github.com/solid/node-solid-server/issues/946#issuecomment-440244460.

@melvincarvalho Are there any concrete, tangible problems that arise from updating the filesystem?

For all involved, I just want to repeat that users of those servers will not notice any change. We are not touching or changing their resources in any way. E.g., whatever they had at https://user.example/abc/def/xyz will remain the exact same. (However, the correspondence of a resource address to a file name changes, but this is a server-internal matter. Think of it as changing the name of a table in a database.)

kjetilk commented 5 years ago

The most problematic part of this that I can see is that it is a problem for those who just wants to run NSS against an existing file system that doesn't have this convention and is mainly managed by some other system.

However, I think the interoperability of Solid shouldn't be in that end, it imposes too many constraints, instead, they should incorporate their own server, or, with this change, write their own ResourceMapper. I mean, the solid spec isn't a very big and complex thing, it shouldn't be that hard to write one for e.g. Nextcloud.

melvincarvalho commented 5 years ago

The most problematic part of this that I can see is that it is a problem for those who just wants to run NSS against an existing file system that doesn't have this convention

That's it. For example I've pointed dropbox at a node solid server running on a file system, in order to get the benefits of both systems. Or fuseFS, or any number of dozens of web servers. Bear in mind none of the other solid servers do this yet, so migration is an issue. Similarly let's say you are using solid with git, then those files dont have extensions and now it's problematic. People may be storing ipfs hashes or did identifiers or any number of things.

It's just very few other systems have this convention. So solid (in fact node solid server) will become bespoke. That could work well, or might become a pain point for interop with other systems.

As I stated before, the issue is the lock in. This may be a good idea, im just not sold on it yet. There has to be a mechanism by which we could change course without large changes to code, and pods diverging. For this reason I suggested 2, one is to have an on/off switch, the other is to isolate the code changes in it's own version. I'd be open to other ideas, though!

What would be good to know is how to back out this change.

RubenVerborgh commented 5 years ago

It's just very few other systems have this convention.

I assume that zero other systems have the convention that no extension means Turtle, so we're probably good.

If you indeed have a Dropbox sync, then you are probably already using .ttl as an extension for your Turtle files. Which means that nothing changes for you. Is that correct?

so migration is an issue

And we provide a solution for that issue with zero impact for people who don't access their pod through the filesystem, or people who already use extensions. So in reality, I think 1 or 2 people are possibly affected, and none if they all use extensions (as @timbl already does).

There has to be a mechanism by which we could change course without large changes to code, and pods diverging.

We have that mechanism: it's the ResourceMapper. Very easy to change. We're just not keeping on board the old buggy implementation, but happy to include other implementations.

What would be good to know is how to back out this change.

Not upgrade, I'm afraid. I cannot justify keeping a buggy implementation around for the very low number of people affected, especially given the benefits it brings for the thousands of others.

RubenVerborgh commented 5 years ago

Similarly let's say you are using solid with git, then those files dont have extensions and now it's problematic.

Git is broken with the current implementation (and I actually had this issue), but not with the future one.

The only thing we are replacing is that extensionless files are not interpreted as Turtle anymore.

melvincarvalho commented 5 years ago

If you indeed have a Dropbox sync, then you are probably already using .ttl as an extension for your Turtle files

All the WebID's on e.g. solid.community are /card and turtle.

The only thing we are replacing is that extensionless files are not interpreted as Turtle anymore.

That'll break 1000s of WebID's tho. Why cant extension-less still be interpreted as turtle, where they exist? So, we will not run this script.

We have that mechanism: it's the ResourceMapper. Very easy to change.

That's good news. So what we would need is an understanding of ResourceMapper that keeps existing functionality. How to do that. Either via a switch, or a code change etc.

Not upgrade, I'm afraid

OK, thanks, I had hoped to avoid this, I do hope you would possibly reconsider. Because, I think there are good solutions here. But, ultimately it's your call. So, no need to run a script on solid.community, just yet.

Happy to go to 4.4, but not yet ready for 5.0.0, until

I'm not ruling out moving to 5.0.0, just want to first understand how we could change the way this is done.

Any of these seem relatively small requests from a distance. And IMHO, it would be a win for testing if we have lots of users running the same version of NSS.

We're just not keeping on board the old buggy implementation, but happy to include other implementations

It's not about keeping around bugging code (nobody wants that!). Unclear which bugs you mean, ie just the PUT + mime type, or this plus other bugs.

I dont understand why this code change cant be self contained. It is after all, a major architectural change breaking URI opacity. It's the bundling together with other code that is a concern (if indeed that's a concern -- it's indicated from the version number 5.0.0). Or we cant have a switch. Or some other solution. Such as patching resource mapper. Failing that, the community can try to put together a team to make it a switch etc. I will perhaps reach out to the solid community group and see if some folks will help.

At this point, the default position is that pods (inrupt.net and solid.community) will diverge, hopefully, only for a short period of time.

RubenVerborgh commented 5 years ago

That'll break 1000s of WebID's tho.

Nothing will break when following the upgrade instructions.

Why cant extension-less still be interpreted as turtle, where they exist?

Because that breaks other things, like git, as you remarked above.

So what we would need is an understanding of ResourceMapper that keeps existing functionality.

We do not want to do that, because the existing functionality is buggy. But you can do that on a code level by replacing ResourceMapper by LegacyResourceMapper.

  • it's understood how to use ResourceMapper to revert to existing functionality

So that is understood, but not something we want in core, because of technical debt reasons.

It's not about keeping around bugging code (nobody wants that!).

But that is what your request comes down to. The current resource mapping is buggy, we have established that. You are asking to keep it available behind a switch.

Unclear which bugs you mean, ie just the PUT + mime type, or this plus other bugs.

At the moment:

It is after all, a major architectural change

It's not. Nothing in the architecture changes.

Failing that, the community can try to put together a team to make it a switch etc.

Sure. But I think the crucial point is finding people who are affected by this change. (They are a subset of the people who access their Solid pods through the file system, and that set is already very small.)

The number of people affected by the bugs is far greater in any case.

At this point, the default position is that pods (inrupt.net and solid.community) will diverge, hopefully, only for a short period of time.

For that, we need to discuss ownership and maintenance of the servers etc. See Gitter.

If you keep solid.community on the old mapper, you will still have the bugs mentioned above. The way to fix them is to replace the mapper.

From the perspective of solid.community users—who don't have file-based access to the server—nothing breaks, nothing changes, but important bugs are fixed. So I don't see why you would want the solid.community people to have those bugs when an upgrade introduces no disadvantages or visible changes for them (other than fixed bugs).

melvincarvalho commented 5 years ago

It's not. Nothing in the architecture changes.

The URIs of files change, and that breaks opacity, unless you have a translator. I think we have a major architectural disagreement here. In that the web should be viewed as both an http: and file: space. While it may be apparent from a certain perspective that it's only http: , actually both are fundamental to solid, when viewed as a webization of the file system.

So, once again. I am all for bug fixing. And in time we certainly will do that. But there is more than one way to fix this bug. The aim is to avoid lock in to one particular solution. There is no willingness at all to keep bugs. What is important is the regression.

This can be achieved by putting this change in, as an upgrade from 5.0.0 to 6.0.0. We would happily upgrade to 4.4 then, and then to 5.0.0, and then to 6.0.0. If there is a way to have a clean upgrade path, we definitely would want to do that. But if 5.0.0 bug fixes are contingent on changing the file system of every user, we'd like to hold off a bit until we understand it better. Hope that makes sense.

RubenVerborgh commented 5 years ago

From your post above, I take it that we are in agreement that nothing changes for people who only use HTTP-based access. So, before I reply to your points, may I ask the following questions?

I think the answers are:

So assuming you only have one account, this means that:

So as far as upgrading solid.community is concerned, I hope those statistics speak for themselves.

Now you are a very respected member of the community, and you have a much longer history with the project than I have. However, as voices of the community, I think it is important that what we say also reflects the needs of a part of the community. So how many people do you honestly estimate that will be affected? (i.e., file-system users of Solid that use extensionless files). I know two file-system users of Solid, that is you and Tim. And Tim is the one who proposed the new mapping, and he does not use extensionless files anyway so will not be affected.

Please understand the cost associated with keeping extra features around. This one switch means that we have to test everything in two scenarios. And that can be acceptable, if enough people are in both scenarios. I'm not sure that this is the case, but happy to be convinced otherwise.

Now for your technical points.

The URIs of files change, and that breaks opacity, unless you have a translator.

This assumes two things:

I think that opacity will not be broken, because the mapping is available and there is a translator.

However, you might have meant with "opacity" that there is an identical relationship between the file system and the URL space. And that is not the case: we don't have opacity in the first place, not today and not in v5.

Now there is a (large) subset of URLs in Solid pods that have an identical relationship: they are the URLs with file extensions (and directory URLs). And that subset is not touched by this change!

Now there is a (small) subset of URLs in Solid pods that do not have an identical relationship. Those are the URLs without file extensions (that are not directory URLs). Current Solid pods will serve them on the HTTP system as text/turtle, but on the file system as octet-stream. So opacity is an illusion there. New Solid pods will add an extension for them on disk, such that they are correctly opened on the file system (which was Tim's concern) and have a correct MIME type on the HTTP system (which was many people's concern). The price we pay is that URLs are not exactly the same (but their mapping is documented and in that sense opaque).

Also note that files like 'a b.txt' and 'a%20b.txt' do not have the identical mapping either, so such a mapping never really existed in the first place. And let's not talk about the unmappable 'a%2fb.txt'.

So things that were opaque, still are. Things that were not, are still not. But bugs are fixed.

In that the web should be viewed as both an http: and file: space.

That's not a sustainable metaphor in general. (I can point you to a server with an infinite URI space.)

But there is more than one way to fix this bug.

Yes, please contribute other mappings to node-solid-server.

This can be achieved by putting this change in, as an upgrade from 5.0.0 to 6.0.0.

The main reason the bump to v5 is major, is this specific change.

But if 5.0.0 bug fixes are contingent on changing the file system of every user

As argued above, for solid.community, only 0 or 1 people actually use the file system there, so for everyone else nothing changes (except for fixed bugs).

melvincarvalho commented 5 years ago

@RubenVerborgh I get where you are coming from, because I've argued the exact case you made in the past. Over time I have come to learn the value of making the file system first class. I think I have over the course of a number of posts, made a wide number of compelling cases which you can go over. The main one being that breaking a file URI will translate to a broken file in say, NextCloud, or many systems we want to interoperate with, which is a potential future network effect bigger than any one pod. The future of solid community I see as one that interops with solid, but also works with to bootstrap other systems.

This is not the question at hand tho. The question at hand is why cant this be done in a way that minimizes breakage and provides regression? That is the cause for caution.

The main reason the bump to v5 is major, is this specific change.

So this is the main issue here. Are you saying there are no other breaking changes in 5.0.0? Genuine question. If so let's put in all the changes we can, up to 4.x and we want to take those. We just dont want to take this complex upgrade with other non related changes together because it becomes really hard to back out. So if we go up to 4.x, that's fine. And if this change is relatively self contained and goes to 5.0.0 we can do that too. But my impression was that there are other semver major changes.

Another way to do it would be as follows. And this may be a good idea in it's own right. Dont upgrade all the pods at the same time when there is a major breaking release. First upgrade inrupt.net. Then if that is shown to be working, we can create an doc showing how to back out this specific change which I think would be running a reverse script and a patch somehow to the resource mapper. With that in place, we could then upgrade solid.community.

I'm just not convinced that overloading a file system URI is a good long term solution to this issue. And then lets say if you point another solid server or web server at that file system, suddenly your WebID doesnt work. We can certainly try it for a while and see what we learn, and then see if we can make other strategies. But it's a paradigm shift and we want to mitigate long term lock in, which from what you have been saying doesnt sound too hard -- but needs to be understood better.

RubenVerborgh commented 5 years ago

Since my technical points still stand and haven't been replied to, I'll refrain from further discussion, especially since they make the proposed plan unnecessary.

in a way that minimizes breakage and provides regression

That's what we have. There is zero breakage for HTTP users, and nothing that wasn't broken already breaks for filesystem users.

At some point, we just need to move. The issues have been there for over a year, there was plenty of time for alternative solutions. None came. And there still is time, just not for v5.

RubenVerborgh commented 5 years ago

@melvincarvalho

Solid community will upgrade to NSS releases, but will hold off on running the script that changes the file system

It is urgent that we figure out who has the responsibility for that decision. Based on your comments in #946, you own the solid.community domain, but MIT owns the server on which the software and data reside. So it is not clear-cut.

melvincarvalho commented 5 years ago

OK!

So, in summary : lots of options on the table.

Happy to go with all the 4.x changes.

But, a prerequisite for upgrading solid.community to 5.0, via this migration script, I would say, is to have the practical knowledge of how to effectively back out the change.

melvincarvalho commented 5 years ago

It is urgent that we figure out who has the responsibility for that decision. Based on your comments in #946, you own the solid.community domain, but MIT owns the server on which the software and data reside. So it is not clear-cut.

@RubenVerborgh absolutely! I have always considered it a team effort.

RubenVerborgh commented 5 years ago

practical knowledge of how to effectively back out the change.

Execute the following command:

find /solid/accounts -exec rename 's/\$\..*//' {} \;
csarven commented 5 years ago

If solid.community is intended to be a flagship service, then it should act like it. Self-dogfood and all that. That's part of the social contract of running a flagship.

If it is intended to be an arbitrary pod service out there, then of course it makes sense to evaluate concerns beyond what's directly associated with the Solid project.

I was under the impression that solid.community would run reasonably recent version of the software - putting aside wip - in order to help move the ecosystem forward, attract new users, as well as give incentive for existing users to... keep using it. As I see it, it is supposed to showcase what is first and foremost the "solid-centric" stuff, and everything else is secondary - and I don't mean to devalue the other initiatives by saying that.

If solid.community is not intended to be the flagship service for Solid, then I strongly suggest that another pod fills in that role as soon as possible. No reason to stall.

melvincarvalho commented 5 years ago

@csarven thanks for the feedback. You are very welcome to help with solid.community, and the vision going forward. You've always been welcome to help, but unsure you have cycles free?

It just happens to be the first pod. I sort of campaigned for years to try and get a public node-solid-server pod running, so that users could use solid, play with it, experiment etc. Simply because there were none out there, apart from the test server.

OTOH you, or anyone else out there, are very welcome to run a pod, and we encourage that. I dont think we want to play favourites, or flagships, solid.community was just the first, and we will try as a community to manage it responsibly. Right now, it's just common sense that prevails. The aim is to have a stable and hopefully long running server. It's quite fun, since there are so many directions a solid server can go.

melvincarvalho commented 5 years ago

practical knowledge of how to effectively back out the change.

Execute the following command:

find /solid/accounts -exec rename 's/\$\..*//' {} \;

Thanks!

That would appear to be one half of the process.

The other part that is required is the accompanying code change in order for the server to still be able to serve files, in particular, a user's WebID.

RubenVerborgh commented 5 years ago

As mentioned above: LegacyResourceMapper

melvincarvalho commented 5 years ago

Chatted to @RubenVerborgh out of band about this. I now understand that the default content type is indeed a single setting which could be switched in order to provide backwards compatibility, and prevents long-term lock in. So, I think I'm good with this change. Thanks for taking the time to explain.

RubenVerborgh commented 5 years ago

Thanks @melvincarvalho.