Is refactoring or rewriting the most efficient and cost-effective way to realize a high-quality and robust codebase for NSS?

kjetilk commented 6 years ago

Unfortunately, we didn't manage to have full attendance for our meeting in Boston where we discussed the choice of refactoring this module, vs rewriting it from scratch with a deliberate architecture. It has become a recurrent conversation on gitter, but I feel those conversations have not been very productive due to the constraints of the chat as a medium. Perhaps a github issue is better suited for that conversation. For us developers, I think it is very important we can have a resolution to this.

dmitrizagidulin commented 6 years ago

I don't have an opinion on whether nss should be refactored or rewritten (I'm refactoring it in my spare time, but I'll be equally thrilled if it's re-written). And I wholly agree that the general state of nss code quality makes it difficult to work with.

But I do want to push back against a given example.

@RubenVerborgh

Case in point: do check the code that was needed to implement account deletion

@melvincarvalho

Deleting an account on a server can literally be 1 line of code.

This is a great example of some of the misunderstandings in this discussion thread. No, it can not be just 1 line of code. Like, not even in theory, under ideal architecture. And it's not even an issue of whether you're just deleting just the user account directory or also the database entry. There are a lot of moving parts underneath, mostly having to do with user confirmation UI/UX.

Take a look at some of the suggested implementation notes on the original issue. (And that's not even a full list, it's about half or two-thirds of the steps needed.) None of those steps really depend on architecture. And they can't be reduced in any significant way. Like, you'll always need to add the delete button ui. (and the corresponding routes / controllers for it). You'll always need to prompt for confirmation. You'll always need to generate and store a one-time token. And generate a confirmation email. And implement a confirmation screen from that link. And only then you get to deleting the actual account.

So I want to ask, @RubenVerborgh - what are we meant to conclude from looking at the code for account deletion? Do you think that once NSS is refactored or re-written, the code will be significantly different, like by orders of magnitude?

RubenVerborgh commented 6 years ago

No, it can not be just 1 line of code.

Indeed: there's the UI, storage, accounts, and tests for all of these.

what are we meant to conclude from looking at the code for account deletion?

The single-responsibility principle is violated in many places. Too many moving pieces. https://github.com/solid/node-solid-server/pull/858/files took 20 changed files and over 1,000 new lines of code. Duplication of existing code because insufficiently modular. Cannot nearly be tested as well as we want to. No one dares to state that it really works in all cases, will never delete files it is not supposed to, or otherwise corrupt or endanger a running server. No one knows what existing functionality we broke by implementing it. We do not have the slightest idea about the security implications.

kjetilk commented 6 years ago

Deleting an account on a server can literally be 1 line of code.

This is a great example of some of the misunderstandings in this discussion thread.

Yeah, I just didn't feel obliged to point that out. There's a lot of stuff going on under the hood.

However, I'm pretty sure somebody has already written code that does account deletion. There should be a framework somewhere will such things are already implemented, and an architecture should just wrap this with minimal code.

dmitrizagidulin commented 6 years ago

(Ugh, apologies for the double-post. Github is acting up for me today.)

dmitrizagidulin commented 6 years ago

took 20 changed files

Ok, so, let's take a look at the account deletion PR. There's 20 files affected, which may seem like a lot for something as "simple" as account deletion. But let's take a look at what files they are. 5 of those are UI template updates (and one email template). 4 of those files are unit tests. And 2 are (not quite sure why) a package.json and package-lock updates. So we're already more than halfway through those 20 files. And the remaining 9 (8, really, since the debug.js edit is optional) are route and controller code for the 3 different kinds of requests necessary for the UI for account deletion.

So, just for my own calibration, how would these changes be different under ideal architecture?

RubenVerborgh commented 6 years ago

And the remaining 9 (8, really, since the debug.js edit is optional) are route and controller code for the 3 different kinds of requests necessary for the UI for account deletion.

Point taken, I didn’t pick the best example regarding number of changes. Other points still stand though (but happy to be proven wrong).

A better example with regard to number of changes is: “Solid, please don’t assume that extensionless files are Turtle.” Seems like a simple enough things.

After having created all of the rules that govern extensions and MIME types into one file (https://github.com/solid/node-solid-server/pull/643), any guess on how many files we’d have to change to make that work?

I honestly have no idea, but it’s 9 and counting, and I am not even near a quarter of the way: https://github.com/solid/node-solid-server/compare/feature/wire-legacy-resource-mapper And as I’m making more edits, things are progressively becoming much harder to the extent that I don’t know whether I can finish this.

And we’re talking about a really simple change here, conceptually.

dmitrizagidulin commented 6 years ago

@rubenverborgh ohh yeah, no argument there. That kind of stuff is a nightmare to change. Wholly agreed.

melvincarvalho commented 6 years ago

One line of code doesn't work; there's still the account db and associated management.

Why is this a requirement? Surely it was a design decision. And one which made the patch more complex. Is there also a requirement to delete all entries in the server back ups? Point 1 - design can influence complexity, as well as architecture!

Similarly careful reading of my comment shows that I was referring to the server side work, not the client side UX which is a different set of files, more than half the code. Point 2 the complexity around the server work in this case has been over stated.

no one dares to state that it really works in all cases, will never delete files it is not supposed to

Well, if you move to a deleted folder, you lower the risk of arbitrary deletions, and also what about the user when they want to reverse the deletion?

These technical details are good inputs to v.next, and see where good design, refactoring, and rewrite can help.

RubenVerborgh commented 6 years ago

Why is this a requirement? Surely it was a design decision. And one which made the patch more complex.

Ah no, this is exactly the point! Can you prove that the server doesn't blow up if its accounts table is in an inconsistent state? Or when a folder isn't there when it expects it to be there? That a user doesn't get to see another user's files when their folder is deleted? How do you test such guarantees? Which (n) component(s) provide(s) these guarantees?

I was referring to the server side work, not the client side UX which is a different set of files, more than half the code

The server still needs API calls to listen to the client. And the server serves the client, currently. And the served assets aren't tested. Or transpiled. Or validated.

Well, if you move to a deleted folder, you lower the risk of arbitrary deletions

Do we really? What happens if, when a delete is in progress, the user writes to a folder? Does the delete fail? How do we detect a semi-move? How do we recover? What portion of the server do we need to adjust to make that move happen? Do we need a daemon? Is the server unresponsive to requests from other users during the move? How do we test that?

These technical details are good inputs to v.next, and see where good design, refactoring, and rewrite can help.

Yes, but this is not the point of this thread. The point is to find evidence for or against a refactor or rewrite.

The evidence I aim to bring with this post is that the above questions are really hard to answer on the current codebase, and that this is a problem.

melvincarvalho commented 6 years ago

Ah no, this is exactly the point! Can you prove that the server doesn't blow up if its accounts table is in an inconsistent state? Or when a folder isn't there when it expects it to be there? That a user doesn't get to see another user's files when their folder is deleted? How do you test such guarantees? Which (n) component(s) provide(s) these guarantees?

@RubenVerborgh you can test this using our dev server solidtest.space -- remove a directory and then try logging on. The server should not blow up by removing files, because solid was designed to remove files. You can also read the code and write a degree of automated test.

The urgent requirement in this case was to remove email address and name from public publishing. The solution was more complex that solving that sole issue.

Do we really? What happens if, when a delete is in progress, the user writes to a folder? Does the delete fail? How do we detect a semi-move? How do we recover? What portion of the server do we need to adjust to make that move happen? Do we need a daemon? Is the server unresponsive to requests from other users during the move? How do we test that?

mv (move) is an atomic operation of the servers we are using, and can be reversed. Recursive delete is a more difficult and dangerous proposition, for all the reasons you point out. The difficulty was on the client side. But this thread was about the server side.

So, I think we've debunked the hyperbole associated with this issue. ie that the delete account code was bloated due to server architecture. No, it was bloated due to a combination of factors, such as design decisions, client side code, user of server side templates, writing and exposing a new API workflow.

We need to create a suite of systems tests first. A test suite that anyone can run to verify their server. We need to do that whether we do a rewrite or a refactoring effort. I think it is the first project we should embark on once 5.0.0 is released.

@kjetilk I think this thread has been already useful in that we can do a better stock take after 5.0.0 is out with some good case studies. To documenting the whole API of 5.0.0, and then writing the tests is going to be a fantastic input. And so is the test suite mentioned above. It's also great that case studies are documented in this thread. I still think the rewrite is the high risk option, because there's not regression guarantee, so the burden of proof shifting is not yet something I buy into. Good discussions, tho!

RubenVerborgh commented 6 years ago

@RubenVerborgh you can test this using our dev server solidtest.space -- remove a directory and then try logging on. The server should not blow up by removing files, because solid was designed to remove files.

Note that the deletion mentioned here is not covered by the Solid spec, because it's a deletion above the level that the regular HTTP interface can access, namely the folder / itself. The user does not have an ACL on that folder.

You can also read the code and write a degree of automated test.

Both of which are really hard, because of the codebase state and architecture.

The urgent requirement in this case was to remove email address and name from public publishing.

People demanded account removal really.

So, I think we've debunked the hyperbole associated with this issue. ie that the delete account code was bloated due to server architecture.

The "hyperbole" in this issue was about much more than bloat, but about testability, stability, etc. Note also that the issue is not just about architecture, but state of the codebase.

No, it was bloated due to a combination of factors, such as design decisions, client side code, user of server side templates, writing and exposing a new API workflow.

All of which relate to the state of the codebase.

I still think the rewrite is the high risk option, because there's not regression guarantee, so the burden of proof shifting is not yet something I buy into.

You are arguing for a refactoring, and your argument is based on the fact that such a refactoring is feasible and not high risk. However, crucial to a refactoring is being able to set a concrete first step of something that can be refactored. I have worked on the server code intensively, and I can—honestly—not identify such a step that is not an in-place rewrite.

So yes, I am shifting the burden of proof on the existence of a meaningful refactoring to you, because you assume that such a refactoring exists. Any proof that such a refactoring exists, and will not be high-risk, will strengthen your case. Any absence of such a proof (in presence of the evidence of non-feasibility we have given above), or evidence against the validity of the original points made at the top of this thread, leaves that claim unsubstantiated.

You are free to conclude that a rewrite is the high-risk option, but we can only share that conclusion if you show us that a refactoring is not. I'm saying no meaningful refactoring path exists because of https://github.com/solid/node-solid-server/issues/788#issuecomment-423354608. If you say it does, the burden of proof is yours.

kjetilk commented 6 years ago

@melvincarvalho Yeah, I gotta echo @RubenVerborgh 's sentiment here. You have done nothing to document your assertion that a refactor is the lower-risk operation, because you have not provided any evidence for how and where refactoring can be done. For the record, I think both are high risk operations, but for different reasons: Refactor is high risk because the known volume of work to be done is very high and difficult, rewrite is high risk because we do not have the people who learned the lessons from the previous version on the team for the next version. However, I don't see that we can gain much from refactor, but we have a lot to gain from rewrite.

melvincarvalho commented 6 years ago

You are arguing for a refactoring, and your argument is based on the fact that such a refactoring is feasible and not high risk

I'm not arguing that, because it is self-evident. Refactoring, by definition, cleans up code, without changing functionality.

I hear your view that you consider it impractical, but it's not the only view that can be had.

During the process of getting to 5.0.0 I've identified things that can be refactored, such as the get handler to fix the bug where HEAD was showing the wrong thing. We talked about that on gitter. So the idea that refactoring in stages is impossible seems to be an appeal to authority right now -- which is fine!

There will be a lot greater visibility on this after 5.0.0. @RubenVerborgh I think you've ruled yourself out in coding this, if im not mistaken. I wont either be working on the v.next code base. But there certainty will be a lot of work needed on the 5.x, which wont be able to wait for v.next. So to an extent we have to look at the feasibility of this with an eye or resource allocation. Still not buying in to the burden of proof shifting, sorry!

RubenVerborgh commented 6 years ago

it's not the only view that can be had

Thankfully.

And I'm willing to consider any other view, given evidence for its validity.

your argument is based on the fact that such a refactoring is feasible and not high risk

I'm not arguing that, because it is self-evident. Refactoring, by definition, cleans up code, without changing functionality.

It's not self-evident.

It's not because we're not changing functionality, that it is suddenly low-risk. I challenge you to refactor any of the node-solid-server files from callbacks to promises or async. Zero functionality change, but tell me again it's low risk when your're done 😄

elf-pavlik commented 6 years ago

It's not because we're not changing functionality, that it is suddenly low-risk. I challenge you to refactor any of the node-solid-server files from callbacks to promises or async. Zero functionality change, but tell me again it's low risk when your're done :smile:

:+1:

@melvincarvalho looking at all the 18 commits you made to this repo it looks mostly like release commits, merging PRs and documentation plus few small feature / bug fix commits.

I think that if other contributors shift their focus to rewrite they will still review your PRs where you focus on refactoring existing code and adding new features. This way no one stops you from submitting new features and improving shape of existing codebase. I really think that in the end we can build some shared understanding by talking the talk but walking the walk results in running code and that in software project matters the most.

Myself I have no contributions to this codebase but I would like to follow the rewrite closely and in that process maybe start finding some low hanging fruits to start contributing little by little.

melvincarvalho commented 6 years ago

It's not self-evident.

To be slightly clearer. I'm using what I think is the standard understanding of refactor.

Code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior.

Certainly, as I stated, some refactoring may be impractical or not cost effective. But if itself is a refactor, it will not be risky, just time consuming. Otherwise it's not a refactor.

Hope that clear that one up. I'm going to bow out of this thread now, until we get to 5.0.0, at which point things will be much clearer.

RubenVerborgh commented 6 years ago

Certainly, as I stated, some refactoring may be impractical or not cost effective.

Indeed, and an important part of the argument here is our evidence that refactoring of NSS seems to demand more work (for less result) than a rewrite.

But if itself is a refactor, it will not be risky, just time consuming. Otherwise it's not a refactor.

Nothing in the definition you posted precludes high risk.

Quite the contrary, your definition says "without changing its external behavior", but we currently have very limited ways of know whether that external behavior changes. It is very hard to make one modification in NSS for which you can guarantee external behavior does not change. As proven by the many regressions we have every time we fix something. Recall how 401 has been fixed more times than I can count already.

With a comprehensive test suite, we would at least know what we'd be breaking. But it wouldn't change the fact that we would.

Restructuring poses a high risk ("risk" as in failure to achieve the desired goals on time) if there is no good structure in place in the first time. Think of it as restructuring a house that was built without an architect: do you feel comfortable working on the second floor, not knowing what is below?

Based on all of the evidence posted in https://github.com/solid/node-solid-server/issues/788#issuecomment-423354608, we consider a refactor of NSS risky. If you change anything on the internal structure of NSS, you have no way of knowing how it affects the others (and external tests will not help you discover the cause, they only show breakage). Furthermore, a change in one component ripples through many other components. Fix one bug and create two more—and you'll only find out about them after you've deployed.

Such a code base is risky to refactor, because we don't know where to begin, and we don't know for every step we take how many other things will break.

I will accept any evidence that NSS would not be such a codebase as I'm describing above, but this thread so far only has evidence that NSS indeed is exactly that.

But if itself is a refactor, it will not be risky, just time consuming. Otherwise it's not a refactor.

Consequently, if we fail to identify a task on NSS that is not risky, a refactoring of NSS does not exist? :wink:

I'm going to bow out of this thread now

Okay. When you return, please bring evidence about NSS being risk-free to refactor, for instance, by showing us a concrete component where we can start. One concrete suggestion for one concrete improvement will be the first evidence for the possibility of a refactor—evidence that we are still lacking up to this point.

megoth commented 6 years ago

Maybe we distinguish between improving code (which includes refactoring, but also restructuring certain parts of the code) and rewriting, instead of (just) refactoring and rewriting?

I should note that I'm for rewriting, as I think refactoring the current code base is going to be extremely difficult (that is, I think we would have to go through many rounds, and quite certain that it would take more time than a rewrite in the long run). A lot of it would actually be (hopefully) improving the existing code, as some parts are very difficult to test right now, and in those cases we cannot be 100% sure that changes are merely refactoring.

justinwb commented 6 years ago

Just wanted to add this bit to the discussion in hopes that it will allay any concerns about diversion of attention from current server during rewrite or refactor efforts.

We are currently, and will continue to dedicate resources to sustain and stabilize the current server. It's what we have now, and will be until we have something better. And so in the dialogue here (which is all good and constructive) - keep in mind that no matter what path we choose, there's still going to be people dedicated in parallel to keeping the current codebase in good working shape so people can use it and get value from Solid.

kjetilk commented 5 years ago

With IPS up, we can close this issue :-)

nodeSolidServer / node-solid-server

Is refactoring or rewriting the most efficient and cost-effective way to realize a high-quality and robust codebase for NSS? #788