thaljef / Pinto

Curate your own repository of Perl modules
https://metacpan.org/module/Pinto::Manual
66 stars 49 forks source link

Behavior of `pinto delete` #54

Open schwern opened 11 years ago

schwern commented 11 years ago

I just used pinto delete for the first time and have two problems with it.

I couldn't verify what it was going to do to the index, nor log why I'm making the change, nor roll it back. This is the most critical problem.

In this case I have ExtUtils-MakeMaker-6.62 and ExtUtils-MakeMaker-6.30. 6.62 was pulled first with 6.30 pulled on top of it. 6.62 remains in the index and sometimes winds up getting installed (a result of the problem in #48). So I decided to delete it. This leaves ExtUtils::MakeMaker unindexed, pinto did not put ExtUtils-MakeMaker-6.30 in its place. I had to manually reindex (ie. pull, I still think there should be an explicit reindex command) 6.30.

Automatic reindexing would require Pinto to track if there are earlier revisions of a distribution in the repository, which would be a very useful feature for other reasons (such as #48). It would also have to track the order in which releases were installed, as the proper behavior is not to go to the previous version in version order, but to go to the previously pulled version.

thaljef commented 11 years ago

delete is dangerous. It is your time machine to remove all references to an archive, as if it had never been put there in the first place. It affects all stacks, and there is no history of the event.

Most of the time, what you want is the unregister command. That simply removes packages from the stack. That creates a new revision in the history, just like other commands.

I did recently think of a way to make delete a bit more friendly by removing the archive but simply marking the distribution (and its packages) as deleted in the database, rather than deleting the records outright. But that is non-trivial and not my highest priority at the moment.

In the meantime, there are a couple stop-gaps we could try:

What do you think of either of those? Also, I'd happily accept a documentation patch that warns about the pitfalls of delete more loudly.

thaljef commented 11 years ago

In this case I have ExtUtils-MakeMaker-6.62 and ExtUtils-MakeMaker-6.30. 6.62 was pulled first with 6.30 pulled on top of it. 6.62 remains in the index and sometimes winds up getting installed (a result of the problem in #48). So I decided to delete it. This leaves ExtUtils::MakeMaker unindexed, pinto did not put ExtUtils-MakeMaker-6.30 in its place.

I'm not sure why ExtUtils::MakeMaker got removed from the stack. Only the lingering packages from 6.62 should have been removed, leaving only 6.30 (which should have included ExtUtils::MakeMaker). That could be a bug. If you can, please send me a minimal series of pinto commands to reproduce the bug.

I had to manually reindex (ie. pull, I still think there should be an explicit reindex command) 6.30.

There is. It is called register. I wanted to avoid calling it "reindex" because I felt that word has a specific meaning to PAUSE, but it might not have the same meaning to Pinto.

Automatic reindexing would require Pinto to track if there are earlier revisions of a distribution in the repository, which would be a very useful feature for other reasons (such as #48). It would also have to track the order in which releases were installed, as the proper behavior is not to go to the previous version in version order, but to go to the previously pulled version.

I'm not sure what "automatic reindexing" means. Just because you remove something from the stack doesn't mean you want to go back to the version you had before (or any version at all). I think what you're asking for is a way to recursively downgrade.

48 is fixed in b86dd39. Please give that a try. It doesn't fix this particular scenario, but it should help avoid it.

thaljef commented 11 years ago

I tried to reproduce your problem with EU::MM as described...

pinto -r TEST init
pinto -r TEST pull MSCHWERN/ExtUtils-MakeMaker-6.62.tar.gz -M
pinto -r TEST pull MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz -M
pinto -r TEST ls | wc -l;        # 35 packages
pinto -r TEST delete MSCHWERN/ExtUtils-MakeMaker-6.62.tar.gz
pinto -r TEST ls | wc -l;        # 34 packages

After deleting 6.62, the only difference is that ExtUtils::MM_Darwin was removed. Which is exactly what I would expect, since it didn't exist in 6.30. Unless there is more to the story, it seems to work fine. You can actually see the difference quite clearly if you just use the unregister command instead of delete in the above steps (which is the right command to use anyway).

schwern commented 11 years ago

I think there's a few changes to how delete is handled which would fix a lot.

As you point out, delete is irreversible and dangerous. But when I think "I want to remove something from a repository" I think "delete" or "rm". I don't think "unregister". On the principle of making the safest route the most obvious one, change unregister to delete. What is now delete becomes an optional flag for "really blow this out of history".

Like how git automatically cleans up unreferenced nodes, clean up unreferenced archives automatically. Don't make the user manage this part. Sweep them up after a few weeks. Hang into archives referred to by the last few commits. If they REALLY want to get rid of them, provide an option to clean which controls how old an unreferenced archive has to be. But disk is cheap.

However, Pinto should clean itself up. As a cautionary tale I ran the the disk out on a VM by performing 1500 module installations with cpanm in a few days as part of building Perl rpms. 2 gigs of cpanm build directories on an 8 gig partition. That was a little ridiculous, but it does show that it's pretty easy to generate a lot of churn and not everybody has a terabyte of disk. Pinto has the advantage of not storing duplicates or unpacked archives.

Then the user can reliably restore the archive at a later date and Pinto can verify it. Basically hang onto all the meta data about an archive including its prerequisites. Then Pinto can still perform most of its functions on the history without needing the actual archive.

Then the user can know what needs to be replaced.

For pulled archives, Pinto can try to redownload it from its sources. If it fails, or if the archive was added, it can prompt the user with the dist ID and checksum and they can provide the archive.

schwern commented 11 years ago

My original problem can be reproduced like so...

pinto -r TEST init
pinto -r TEST pull MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz -M
.
pinto -r TEST pull MSCHWERN/ExtUtils-MakeMaker-6.62.tar.gz -M
.
pinto -r TEST delete MSCHWERN/ExtUtils-MakeMaker-6.62.tar.gz
pinto -r TEST ls
[rf-] ExtUtils::Command                                1.09 MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz
[rf-] ExtUtils::Install                                1.33 MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz
[rf-] ExtUtils::Installed                              0.08 MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz
[rf-] ExtUtils::MakeMaker::bytes                       0.01 MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz
[rf-] ExtUtils::MakeMaker::vmsish                      0.01 MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz
[rf-] ExtUtils::Manifest                               1.46 MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz
[rf-] ExtUtils::Packlist                               0.04 MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz

pinto -r TEST register MSCHWERN/ExtUtils-MakeMaker-6.30.tar.gz is a work around, but delete should DWIM. Thanks for register!

thaljef commented 11 years ago

My original problem can be reproduced like so...

That's not what you described. You said "In this case I have ExtUtils-MakeMaker-6.62 and ExtUtils-MakeMaker-6.30. 6.62 was pulled first with 6.30 pulled on top of it."

If you did it the other way around, then you still got the result I would expect. No, I don't agree that delete should just take you back to an earlier version of a distribution. You might delete or unregister something because you just don't want it there any more. Git doesn't just magically restore a file to the previous revision when you delete it.

I think what you really want is a way to "revert" a commit. So if you have 6.62 and want to go back to 6.30, you basically reverse-patch the commit where you added 6.62 (assuming you didn't do anything else in that commit). That is definitely doable.

thaljef commented 11 years ago

Change unregister to delete.

Ok, I can go along with that. But "delete" cannot become an extra option to the unregister command. It just doesn't fit the architecture. It will have to be a separate command. Maybe destroy ? The kill command is already used to delete stacks.

Clean up unreferenced archives automatically.

It does. Deleting an archive immediately removes it from the database and the filesystem. Did you see something different?

Store the checksum and distribution id of the deleted archive. Provide a command to report missing archives. Provide a command to replace missing archives.

This is similar to what I mentioned: preserving some of the data, but just marking the distribution as deleted in the database. The yes, pinto could try to revive them if it can (can never revive a locally added archive though). All good ideas, but on the back burner for now.

schwern commented 11 years ago

That's cool if delete can't be rolled into unregister. The problem with destroy and kill is they don't say what they're destroying or killing and there's nothing about destroying or killing which says if it's acting on a stack or a dist. However, they both permanently delete things and should express that commonality. Since they are both infrequently used commands, there's no need to make them short. kill_stack and kill_archive?

Deleting an archive immediately removes it from the database and the filesystem. Did you see something different?

Yes. Rather than the user manually deleting archives with an explicit command, Pinto could periodically scan for unreferenced archives and delete them from the file system. This is how git does it, unreferenced nodes are automatically deleted after a few weeks of sitting on disk, just in case you want to recover them. The difference with Pinto is it would only care about archives referenced in the head of stacks (and maybe a few commits back). This eliminates most of the need for a user to ever explicitly delete an archive.

For example, let's say I do this...

pinto init
pinto pull USER/Foo-1.23.tar.gz
pinto pull USER/Foo-1.24.tar.gz

Now USER/Foo-1.24.tar.gz is referenced in the master stack index, but USER/Foo-1.23.tar.gz is not referenced in any stack index. A few weeks and a bunch more work on the master stack goes by, but nothing touching Foo. pinto clean is run (either by the user or automatically by pinto) and notices that USER/Foo-1.23.tar.gz is not referenced in any stack index or recent history and hasn't been referenced for a few weeks. So it deletes the archive, retaining the meta data in the database.

thaljef commented 11 years ago

Rather than the user manually deleting archives with an explicit command, Pinto could periodically scan for unreferenced archives and delete them from the file system

There's no such thing as an unreferenced archive. Unregistering an archive only removes it from the stack (i.e. takes it out of 02packages).

The archive will always be there in case you want it back (possibly using the imagined "revert" command). Unless, you delete it. But delete is not a cleanup utility. It is to erase sensitive mistakes.

There is a "clean" command, which does remove archives that get left around because of aborted or failed commands. And yes, pinto could do that automatically. But that is something different from what you're talking about.