RFC: expanded remove - Githubissues

b5 commented 4 years ago

Two big questions came out of the push/pull RFC:

Dealing with list visibility
Undoing a push

The first question is handled by the access RFC.

This RFC takes on "unpushing". Examining the UX of removing data from Qri (both local and network), in a way that's compatible with new APIs.

b5 commented 4 years ago

How do you remove a dataset that you have cloned? Is that part of delete, drop, or storage?

Yet another example of the RFC process working. Totally forgot about this (ahem, VERY COMMON) need. I propose we fold this functionality into qri drop, and have drop be about dropping local datasets as well as datasets on remotes. I'll update the RFC to indicate you'd drop a local dataset you've cloned by not specifying a remote:

$ qri drop  ramfox/nyc_gubernatorial_race_2019

This is also a chance to unify with the --remote flag pattern that all other commands are using, so dropping from the registry would now be:

$ qri drop ramfox/nyc_gubernatorial_race_2019 --remote registry

Attempting to drop a dataset you own will error:

$ qri drop me/dataset
error: cannot drop a dataset you own

because you are the author of this dataset, dropping it would remove the 
authoritative history of this dataset. Instead of dropping, you can 
delete this dataset:

  $ qri delete --all me/dataset

warning: this cannot be undone.

I kinda like that local dropping takes the lead here. Considering we have more data consumers than data publishers, and publishers should be more comfortable with qri in general.

perhaps being prefaced by "storage" implies enough that one removes storage while the other removes... everything?

To me that's exactly right. The meaning of drop is the same in both contexts (removing from persistence, not history), but with storage, you're saying I only want to drop dataset storage, not the log (which can't be dropped unless you drop everything with qri drop).

ramfox commented 4 years ago

from @Mr0grog

I kind of worry that delete vs. drop are still confusingly close terms. What if, instead of delete, we had a name and family of commands focused around editing the history/log?

qri edit

qri revise

Something else?

That would also leave you with a nicer place to eventually put more complex tooling around things like squashing, rewriting older commit messages, etc.

Dunno if that’s hugely better (and it’d make the command slightly more complex, since deleting would probably be some kind of subcommand under those), but it feels at least slightly clearer and more differentiated to me.

Skimming #50, I’m also not really sure I’m clear on why there would be both qri drop and qri storage drop, since they seem like they are working in the same conceptual space, and it reads like these two commands are equivalent:
> qri drop me/my_dataset
# vs...
> qri storage drop me/my_dataset
Or is it just that the usage line in #50 is wrong, and it should be qri storage drop COMMIT (not qri storage drop DATASET)? Even then, it seems like these are in the same conceptual space. That qri drop is only for datasets as a whole and qri storage drop is only for parts of datasets, but never whole datasets, feels odd to me. Or I might just be misunderstanding here.

from @dustmop

For removing from a remote I would much rather see git remove me/oops_i_pushed_it --from-remote registry.

If the user doesn't use the --keep-storage flag, then we purge blocks from IPFS? This seem hard to do, and dangerous, and prevents any kind of "undo" functionality like what I mentioned above. It also makes us work very differently than git. I would rather that we rely on the storage command to run "gc" when the user wants it. Or at least, have the dangerous operation be opt-in, not opt-out, meaning a --purge flag instead.

Currently qri storage drop is supposed to only manipulate blocks & can effect local & remote machines. qri drop is supposed to manipulate blocks & remove logs & can only effect remote machines.

I agree that having so many different commands to describe similar things is confusing. Especially since drop in the two above contexts have very different implications: one means just remove blocks, the other means destroy logs and blocks.

Proposing we separate out concerns based on whether they imply altering a log or not, and utilizing our newish remote pattern

qri delete

Anything with the word delete implies unpinning the blocks AND alters the log of a dataset. (or this can be remove, tbh I don't have super strong feelings about this)

All the previously stated local delete behaviors stay the same, with a few exceptions listed below. But in addition to those behaviors, we have remote behaviors as well: qri delete me/dataset --remote registry Unpins blocks & removes the log of a remote, but does not (like in our other offline&network commands) do anything locally

qri delete me/dataset --remote registry --revisions 1 should error: you can't manipulate the logbook of a dataset on a remote

qri delete not_me/dataset unpins the blocks and then removes the log of a dataset that is not in your namespace. However, because we have to remove the log, and not alter it, we cannot restore a dataset that was removed but not in your namespace

qri delete not_me/dataset --revisions 1 should error: you can't manipulate the logbook of a dataset that is not yours

If you alter your logbook locally using qri delete, you can update the remote log by using qri push No --keep-storage flag, we only unpin locally Keep the --keep-files & --force flags You can use the below qri restore command to restore a dataset in your namespace that has been removed. If you want a foreign dataset back, you need to use qri pull

qri drop

Anything with the word drop implies unpinning blocks ONLY. To the user this means this is their way of cleaning up storage, not effecting the actual dataset history.

qri drop me/dataset@version Unpins the blocks locally at that version

qri drop me/dataset~2 Unpins the blocks locally of the two latest versions

qri drop me/dataset@version --remote registry Unpins the blocks at that version on that registry (again, using the remote flag implies ONLY a network action)

qri drop not_me/dataset@version Unpins the blocks locally of a dataset not in your namespace

To get versions back, either use the below qri restore command for a dataset in your namespace or qri pull to get the dataset from a remote source

or this can be nested into qri storage drop

qri storage

This bleeds with the storage rfc, but is directly effected by this rfc

qri storage stats me/dataset Gives you the stats defined in the storage rfc about a dataset and its storage, locally

qri storage stats me/dataset --remote registry Give you the stats of a dataset and its storage on the remote

qri storage gc me/dataset Removes the blocks from that dataset that have been unpinned

qri storage gc me/dataset@version Removes the blocks of that version if it's been unpinned?

qri storage gc --all Removes all unpinned blocks??

no qri storage drop command, that is encompassed by qri drop

qri restore

And now for the command I've thought the least about that would really be the work of a future rfc, but just mentioning it here because of @dustmop's comments

I think it would makes sense to repurpose qri restore from an fsi command to a delete/drop undo command. The previous use of qri restore should be contained in qri checkout. If the fsi link already exists.

qri restore me/dataset Restores a dataset based on the logbook, if those blocks still exist

qri restore me/dataset@version Restores a dataset version if those blocks still exist

Mr0grog commented 4 years ago

I agree that having so many different commands to describe similar things is confusing. Especially since drop in the two above contexts have very different implications: one means just remove blocks, the other means destroy logs and blocks.

Proposing we separate out concerns based on whether they imply altering a log or not

This is interesting, since I don’t see drop in these contexts being very different and I don’t think of either one as altering a log! 😜

That said, I think I was also trying to make this separation around altering the logbook. (Or the dataset’s history, which is definitely altering the log, but I’m not sure if is altering the logbook beyond adding new entries. I don’t know the logbook well enough.)

Anyway, I’m thinking of it as something like… a dataset can be in a few states on a Qri node:

The node doesn’t know anything about a dataset.
The node knows about a dataset (i.e. it has the logbook and can present the dataset name, owner, and history).
The node doesn’t have any of the the actual versions of the dataset (it just knows their names/IPFS addresses from the logbook). (i.e. has logbook but no blocks, I think?)
The node has some of the actual versions of the dataset, but not others (it could have any set of the versions; they don’t need to be continuous). (i.e. has logbook and all blocks of some versions?)
The node has all the actual versions of the dataset. (i.e. has logbook and all blocks?)

I’m thinking of drop as purely about editing these states (whether local or remote). drop doesn’t alter the logbook at all, though it could change whether I have the logbook.

On the other hand, delete (or edit or revise) would be primarily about altering what’s in the logbook, and might cause some blocks to get unpinned as a consequence of them no longer being referenced by the altered logbook.

That’s similar in a lot of ways to what you’re proposing, but:

qri delete me/dataset wouldn’t really make sense as a command (instead, you want qri drop me/dataset). That’s a little clearer with terminology like edit or revise instead of delete.
qri delete not_me/dataset wouldn’t even be allowed, because you can’t edit the logbook of a dataset that’s not yours/you don’t have rights to. (I hope some day Qri will support me being authorized to edit another person’s dataset, but that’s a while different story.)
Deep agree on editing a remote’s logbook being disallowed (so qri delete me/dataset --remote registry --revisions 1 fails; you should edit locally and push instead).
I think we’re basically in agreement on drop, except for the two items above, where I think I’m saying what you have as qri delete me_or_not/dataset should be qri drop me_or_not/dataset instead. (Again, I think changing delete to something like edit or revise that more clearly says “I’m editing the history of this dataset” helps clarify that getting rid of the whole dataset belongs with drop.)

restore seems cool! Wondering if recover would be clearer for what you’re proposing, though.

Likewise, qri storage gc me/dataset and qri storage gc me/dataset@version would be very nice if they’re not absurdly complex to implement (since I’m not sure IPFS has a way to GC just the DAG under a given block).

Mr0grog commented 4 years ago

Enh, got up this morning and not sure recover is really any better/clearer a name than restore. Too nuanced a difference, if any.

dustmop commented 4 years ago

I feel as is the logbook should purely be an implementation detail, and the user shouldn't have to know about it in order to understand the differences between commands. Details about logbook should never reach the UI layer.

Mr0grog commented 4 years ago

I agree! I only included the logbook in my explanation to connect it to implementation. But was trying to frame drop around the question “what data do I have?” And delete/edit/revise around “changing what the history of this dataset is.”

b5 commented 4 years ago

Thanks so much everyone for your feedback. Clearly lots of room for improvement here, and I'm delighted to see everyone dig in to work this one out.

To me it's clear the original sin of this RFC introducing two words for removing things. Without any additional info, it's way too hard to know these do different things:

$ qri delete user/dataset
$ qri drop user/dataset

We must have an easy, top-level command that takes stuff away, and success here is defined as the command for taking stuff working with as few caveats as possible. I'm inclined to back @ramfox's proposal almost wholesale, including moving drop into the storage command. That would leave us with:

$ qri delete user/dataset
$ qri storage drop user/dataset@/ipfs/QmFoo

I very much agree we (the designers of qri) should be cognizant of when a user is taking an action that will edit history. @ramfox's proposal addresses these concerns by ensuring the API guards against unintended consequences. Presenting clear errors will be a requirement. I'm a big fan of incorporating a --force flag to slow down users.

The upside of this approach is a user can only see one word for going backwards (delete), this also leaves more top-level space for the suite of commands @Mr0grog is proposing. I very much agree commands like revise are in our future, but we can't write meaningful RFCs against that world yet because we haven't picked a collaboration model.

Anyway, I’m thinking of it as something like… a dataset can be in a few states on a Qri node:

This is a great way of thinking about things @Mr0grog, I'd re-write your set of states like this:

The node doesn’t know anything about a dataset.
The node has a resolved reference to a dataset. It's established the username, datasetname, logbook identifier, and head reference from a trusted source, but that's it.
The node has the logbook.
The node has the logbook + a subset of versions
The node has the logbook + all versions listed in the logbook

I’m thinking of drop as purely about editing these states (whether local or remote). drop doesn’t alter the logbook at all, though it could change whether I have the logbook.

This is why I think we should move drop into storage. Ideally users will almost never need $qri storage drop, once we get some sort of $ qri storage gc command in place

Arqu commented 4 years ago

Oh my, lot's of stuff here. Going through it seems like a lot of concerns are already addressed.

My only remaining concern is one of the first brought up: drop and delete being too close in meaning while both being descrutcive, doing an accidental drop feels much worse.

delete being a commit level command I'd rather call it something like revert where we just unroll back in history. Or reset like git does it?

Also very much into the idea of --force if overriding instead of appending history on push.

b5 commented 4 years ago

OK. I've rewritten this entire RFC to incorporate as much feedback as possible. drop is dropped. What's left is an expansion and clarification of how remove should work to address common user concerns of deleting stuff.

I've kept the original remove term for a few reasons:

No one felt delete was an improvement
This RFC is no longer a breaking change
delete is already an alias for the remove command in qri v0.9.8

b5 commented 4 years ago

I'm concerned about review fatigue, so I've asked @dustmop to take a close look as a point-person for making sure this new RFC is headed in the right direction

b5 commented 4 years ago

Ok, this is ready for a final sign-off

b5 commented 4 years ago

Approved! It's in! Huge thanks to everyone for digging in on this one. Qri's going to be much better for it 😄

qri-io / rfcs

RFC: expanded remove #53

qri delete

qri drop

qri storage

qri restore