Closed b5 closed 4 years ago
How do you remove a dataset that you have cloned? Is that part of delete, drop, or storage?
Yet another example of the RFC process working. Totally forgot about this (ahem, VERY COMMON) need. I propose we fold this functionality into qri drop
, and have drop be about dropping local datasets as well as datasets on remotes. I'll update the RFC to indicate you'd drop a local dataset you've cloned by not specifying a remote:
$ qri drop ramfox/nyc_gubernatorial_race_2019
This is also a chance to unify with the --remote
flag pattern that all other commands are using, so dropping from the registry would now be:
$ qri drop ramfox/nyc_gubernatorial_race_2019 --remote registry
Attempting to drop a dataset you own will error:
$ qri drop me/dataset
error: cannot drop a dataset you own
because you are the author of this dataset, dropping it would remove the
authoritative history of this dataset. Instead of dropping, you can
delete this dataset:
$ qri delete --all me/dataset
warning: this cannot be undone.
I kinda like that local dropping takes the lead here. Considering we have more data consumers than data publishers, and publishers should be more comfortable with qri in general.
perhaps being prefaced by "storage" implies enough that one removes storage while the other removes... everything?
To me that's exactly right. The meaning of drop
is the same in both contexts (removing from persistence, not history), but with storage
, you're saying I only want to drop dataset storage, not the log
(which can't be dropped unless you drop everything with qri drop
).
from @Mr0grog
I kind of worry that
delete
vs.drop
are still confusingly close terms. What if, instead ofdelete
, we had a name and family of commands focused around editing the history/log?
qri edit
qri revise
- Something else?
That would also leave you with a nicer place to eventually put more complex tooling around things like squashing, rewriting older commit messages, etc.
Dunno if that’s hugely better (and it’d make the command slightly more complex, since deleting would probably be some kind of subcommand under those), but it feels at least slightly clearer and more differentiated to me.
Skimming #50, I’m also not really sure I’m clear on why there would be both
qri drop
andqri storage drop
, since they seem like they are working in the same conceptual space, and it reads like these two commands are equivalent:> qri drop me/my_dataset # vs... > qri storage drop me/my_dataset
Or is it just that the usage line in #50 is wrong, and it should be
qri storage drop COMMIT
(notqri storage drop DATASET
)? Even then, it seems like these are in the same conceptual space. Thatqri drop
is only for datasets as a whole andqri storage drop
is only for parts of datasets, but never whole datasets, feels odd to me. Or I might just be misunderstanding here.
from @dustmop
For removing from a remote I would much rather see git remove me/oops_i_pushed_it --from-remote registry.
If the user doesn't use the --keep-storage flag, then we purge blocks from IPFS? This seem hard to do, and dangerous, and prevents any kind of "undo" functionality like what I mentioned above. It also makes us work very differently than git. I would rather that we rely on the storage command to run "gc" when the user wants it. Or at least, have the dangerous operation be opt-in, not opt-out, meaning a --purge flag instead.
Currently qri storage drop
is supposed to only manipulate blocks & can effect local & remote machines. qri drop
is supposed to manipulate blocks & remove logs & can only effect remote machines.
I agree that having so many different commands to describe similar things is confusing. Especially since drop
in the two above contexts have very different implications: one means just remove blocks, the other means destroy logs and blocks.
Proposing we separate out concerns based on whether they imply altering a log or not, and utilizing our newish remote
pattern
Anything with the word delete
implies unpinning the blocks AND alters the log of a dataset. (or this can be remove
, tbh I don't have super strong feelings about this)
All the previously stated local delete behaviors stay the same, with a few exceptions listed below.
But in addition to those behaviors, we have remote
behaviors as well:
qri delete me/dataset --remote registry
Unpins blocks & removes the log of a remote, but does not (like in our other offline&network commands) do anything locally
qri delete me/dataset --remote registry --revisions 1
should error: you can't manipulate the logbook of a dataset on a remote
qri delete not_me/dataset
unpins the blocks and then removes the log of a dataset that is not in your namespace. However, because we have to remove the log, and not alter it, we cannot restore a dataset that was removed but not in your namespace
qri delete not_me/dataset --revisions 1
should error: you can't manipulate the logbook of a dataset that is not yours
If you alter your logbook locally using qri delete
, you can update the remote log by using qri push
No --keep-storage
flag, we only unpin locally
Keep the --keep-files
& --force
flags
You can use the below qri restore
command to restore a dataset in your namespace that has been removed. If you want a foreign dataset back, you need to use qri pull
Anything with the word drop
implies unpinning blocks ONLY. To the user this means this is their way of cleaning up storage, not effecting the actual dataset history.
qri drop me/dataset@version
Unpins the blocks locally at that version
qri drop me/dataset~2
Unpins the blocks locally of the two latest versions
qri drop me/dataset@version --remote registry
Unpins the blocks at that version on that registry (again, using the remote flag implies ONLY a network action)
qri drop not_me/dataset@version
Unpins the blocks locally of a dataset not in your namespace
To get versions back, either use the below qri restore
command for a dataset in your namespace or qri pull
to get the dataset from a remote source
or this can be nested into qri storage drop
This bleeds with the storage rfc, but is directly effected by this rfc
qri storage stats me/dataset
Gives you the stats defined in the storage rfc about a dataset and its storage, locally
qri storage stats me/dataset --remote registry
Give you the stats of a dataset and its storage on the remote
qri storage gc me/dataset
Removes the blocks from that dataset that have been unpinned
qri storage gc me/dataset@version
Removes the blocks of that version if it's been unpinned?
qri storage gc --all
Removes all unpinned blocks??
no qri storage drop
command, that is encompassed by qri drop
And now for the command I've thought the least about that would really be the work of a future rfc, but just mentioning it here because of @dustmop's comments
I think it would makes sense to repurpose qri restore
from an fsi command to a delete/drop undo command. The previous use of qri restore
should be contained in qri checkout
. If the fsi link already exists.
qri restore me/dataset
Restores a dataset based on the logbook, if those blocks still exist
qri restore me/dataset@version
Restores a dataset version if those blocks still exist
I agree that having so many different commands to describe similar things is confusing. Especially since
drop
in the two above contexts have very different implications: one means just remove blocks, the other means destroy logs and blocks.Proposing we separate out concerns based on whether they imply altering a log or not
This is interesting, since I don’t see drop in these contexts being very different and I don’t think of either one as altering a log! 😜
That said, I think I was also trying to make this separation around altering the logbook. (Or the dataset’s history, which is definitely altering the log, but I’m not sure if is altering the logbook beyond adding new entries. I don’t know the logbook well enough.)
Anyway, I’m thinking of it as something like… a dataset can be in a few states on a Qri node:
I’m thinking of drop
as purely about editing these states (whether local or remote). drop
doesn’t alter the logbook at all, though it could change whether I have the logbook.
On the other hand, delete
(or edit
or revise
) would be primarily about altering what’s in the logbook, and might cause some blocks to get unpinned as a consequence of them no longer being referenced by the altered logbook.
That’s similar in a lot of ways to what you’re proposing, but:
qri delete me/dataset
wouldn’t really make sense as a command (instead, you want qri drop me/dataset
). That’s a little clearer with terminology like edit
or revise
instead of delete
.
qri delete not_me/dataset
wouldn’t even be allowed, because you can’t edit the logbook of a dataset that’s not yours/you don’t have rights to. (I hope some day Qri will support me being authorized to edit another person’s dataset, but that’s a while different story.)
Deep agree on editing a remote’s logbook being disallowed (so qri delete me/dataset --remote registry --revisions 1
fails; you should edit locally and push
instead).
I think we’re basically in agreement on drop
, except for the two items above, where I think I’m saying what you have as qri delete me_or_not/dataset
should be qri drop me_or_not/dataset
instead. (Again, I think changing delete
to something like edit
or revise
that more clearly says “I’m editing the history of this dataset” helps clarify that getting rid of the whole dataset belongs with drop
.)
restore
seems cool! Wondering if recover
would be clearer for what you’re proposing, though.
Likewise, qri storage gc me/dataset
and qri storage gc me/dataset@version
would be very nice if they’re not absurdly complex to implement (since I’m not sure IPFS has a way to GC just the DAG under a given block).
Enh, got up this morning and not sure recover
is really any better/clearer a name than restore
. Too nuanced a difference, if any.
I feel as is the logbook should purely be an implementation detail, and the user shouldn't have to know about it in order to understand the differences between commands. Details about logbook should never reach the UI layer.
I agree! I only included the logbook in my explanation to connect it to implementation. But was trying to frame drop
around the question “what data do I have?” And delete
/edit
/revise
around “changing what the history of this dataset is.”
Thanks so much everyone for your feedback. Clearly lots of room for improvement here, and I'm delighted to see everyone dig in to work this one out.
To me it's clear the original sin of this RFC introducing two words for removing things. Without any additional info, it's way too hard to know these do different things:
$ qri delete user/dataset
$ qri drop user/dataset
We must have an easy, top-level command that takes stuff away, and success here is defined as the command for taking stuff working with as few caveats as possible. I'm inclined to back @ramfox's proposal almost wholesale, including moving drop
into the storage command. That would leave us with:
$ qri delete user/dataset
$ qri storage drop user/dataset@/ipfs/QmFoo
I very much agree we (the designers of qri) should be cognizant of when a user is taking an action that will edit history. @ramfox's proposal addresses these concerns by ensuring the API guards against unintended consequences. Presenting clear errors will be a requirement. I'm a big fan of incorporating a --force
flag to slow down users.
The upside of this approach is a user can only see one word for going backwards (delete
), this also leaves more top-level space for the suite of commands @Mr0grog is proposing. I very much agree commands like revise
are in our future, but we can't write meaningful RFCs against that world yet because we haven't picked a collaboration model.
Anyway, I’m thinking of it as something like… a dataset can be in a few states on a Qri node:
This is a great way of thinking about things @Mr0grog, I'd re-write your set of states like this:
username, datasetname, logbook identifier, and head reference
from a trusted source, but that's it.I’m thinking of
drop
as purely about editing these states (whether local or remote). drop doesn’t alter the logbook at all, though it could change whether I have the logbook.
This is why I think we should move drop
into storage. Ideally users will almost never need $qri storage drop
, once we get some sort of $ qri storage gc
command in place
Oh my, lot's of stuff here. Going through it seems like a lot of concerns are already addressed.
My only remaining concern is one of the first brought up: drop
and delete
being too close in meaning while both being descrutcive, doing an accidental drop
feels much worse.
delete
being a commit
level command I'd rather call it something like revert
where we just unroll back in history. Or reset
like git
does it?
Also very much into the idea of --force
if overriding instead of appending history on push.
OK. I've rewritten this entire RFC to incorporate as much feedback as possible. drop
is dropped. What's left is an expansion and clarification of how remove should work to address common user concerns of deleting stuff.
I've kept the original remove
term for a few reasons:
delete
was an improvementI'm concerned about review fatigue, so I've asked @dustmop to take a close look as a point-person for making sure this new RFC is headed in the right direction
Ok, this is ready for a final sign-off
Approved! It's in! Huge thanks to everyone for digging in on this one. Qri's going to be much better for it 😄
Two big questions came out of the
push/pull
RFC:list
visibilityThe first question is handled by the
access
RFC.This RFC takes on "unpushing". Examining the UX of removing data from Qri (both local and network), in a way that's compatible with new APIs.