thaljef / Pinto

Curate your own repository of Perl modules

https://metacpan.org/module/Pinto::Manual

66 stars 49 forks source link

pinto add/replace needs a force #26

Open schwern opened 11 years ago

schwern commented 11 years ago

Both replace and add will refuse to overwrite an identical tarball, and there doesn't appear to be a way to force it.

$ pinto replace Net-LibIDN-0.12.tgz schwern/Net-LibIDN-0.12.tgz
Archive Net-LibIDN-0.12.tgz is identical to SCHWERN/Net-LibIDN-0.12.tgz
Archive Net-LibIDN-0.12.tgz is identical to SCHWERN/Net-LibIDN-0.12.tgz at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Pinto.pm line 52.

In this case, I added a tarball without having a ~/.pause and wound up with the wrong PAUSEID. Now I'd like to replace it with the right PAUSEID, but Pinto won't let me. And there's no way to remove the module short of reverting.

A force would be helpful. Also perhaps the identity check should take into account the pauseid? Downgrade it to a warning?

schwern commented 11 years ago

Even after reverting, add & replace will still complain that they have an identical archive and not do the add. Even if I delete the file from authors it still complains.

thaljef commented 11 years ago

On Nov 7, 2012, at 11:03 AM, Michael G. Schwern wrote:

Both replace and add will refuse to overwrite an identical tarball, and there doesn't appear to be a way to force it.

I had a long discussion with @renormalist about this (https://github.com/thaljef/Pinto/pull/16).

My view is that Pinto is like a VCS -- you should always be able to get back to a prior state. That means it can never loose a distribution.

But Pinto isn't like a traditional VCS where you switch between branches that have different versions of the same file. Instead Pinto has all "versions" of the distribution at the same time (when you revert, all you're doing is changing the state of the index, not the distribution files). Therefore, every distribution in the repository must be unique so they can all be on the filesystem together.

When you ship a distribution to Pinto, you've effectively made a release. People using the repository may have stacks that use the distribution, so changing it out from under them may cause their app to break. It's the same reason that PAUSE doesn't let you release the same distribution twice.

But mistakes do happen, and sometimes want a convenient way to ship a new distribution and quickly put it on all the stacks. That's what the "replace" command is intended for. But the old distribution still has to be in the repository so people can revert their stack (in case your replacement is incompatible).

Does that make sense?

schwern commented 11 years ago

My view is that Pinto is like a VCS -- you should always be able to get back to a prior state. That means it can never loose a distribution. ... Therefore, every distribution in the repository must be unique so they can all be on the filesystem together.

I think this is a case of the implementation not being up to the philosophy. An internal issue leaking into user space. If Pinto wants to store all historical state, but it can't because it's using a filesystem to do the storage, then ideally Pinto improves its implementation. If it can't improve the implementation to handle the philosophy you can either back off on the philosophy or compromise usability.

For this particular case, if the author ID is different there's no filesystem conflict. In the general case it could archive the older file elsewhere, in the database or in an archive directory, and swap it back into place if needed. As an aside, I was surprised to see that the stacks weren't using symlinks for identical files.

Though if the files are exactly the same, what information would Pinto lose?

thaljef commented 11 years ago

On Nov 7, 2012, at 12:38 PM, Michael G. Schwern wrote:

I think this is a case of the implementation not being up to the philosophy. An internal issue leaking into user space.

Yes, that's a good way to describe it. I really, really wanted to just use Git or Svn for the VCS. But I just didn't see a way to make it work with Pinto (can't merge a database file).

If Pinto wants to store all historical state, but it can't because it's using a filesystem to do the storage, then ideally Pinto improves its implementation. If it can't improve the implementation to handle the philosophy you can either back off on the philosophy or compromise usability.

As long as the repository is addressable through file:// URLs I think it has to be this way. But if we give that up, more interesting stuff is possible. For now, I think it is a reasonable thing to ask users to live with. It can always be changed down the road when we are smarter. I don't think there's anything in there that would prohibit it in the future.

For this particular case, if the author ID is different there's no filesystem conflict. In the general case it could archive the older file elsewhere, in the database or in an archive directory, and swap it back into place if needed.

That's an interesting idea. Let's wait and see how things play out before going down the road. I still don't have a lot of feedback on how people actually use Pinto in the wild. Most of the use cases are just imagined, based on my own experience (with a bit of input from RJBS).

As an aside, I was surprised to see that the stacks weren't using symlinks for identical files.

Unless you've got an old Pinto (like more than 2 months old) then all stacks are just symlinks to the same pile of distributions.

-Jeff

renormalist commented 11 years ago

Sorrry for my long absence, I was short on time, might improve the next weeks. However, quick thoughts here:

Re on the "this is how PAUSE does it": I personally want to use Pinto to build my own repositories. For that I explicitely do not want any self-restrictions. I could always circumvent them anyway (eg. by starting from scratch with an empty repo, just because it very is not PAUSE but it's me who makes the rules). This in turn means it's just a matter of how hard to make it. This in turn leads me to: make it as easy as possible.

Re on VCS: I'm not sure whether that use-case is too complicated. IMHO the stacks are such a very strong and simple concept that they are good enough as is. It's hard to follow code in a timeline but at least there I can see a few lines that make a commit, and understand what it means to revert them, etc. For complete CPAN distributions this picture in mind does not work. It's way too abstract in contrast to the "diff" metaphor that VCSs use.

What I liked on my "add --force" proposal is

its transparency, which an explicit "replace" does not provide - there I need additional complexity to decide whether to use 'add' or 'replace'.
Imagine a repeated "dzil build ; pinto add" with same version everytime for a staging area: With replace vs. add you need to know whether you run it for the first time or not; while add --force always works.
it even contained a "delete" which it just did not expose as explicit command, yet.
it is philosophically simple: delete and add again as if it had never existed. Risk is on the user, of course, which is ok for a parameter named --force.

thaljef commented 11 years ago

On Nov 7, 2012, at 2:59 PM, Steffen Schwigon wrote:

Sorrry for my long absence, I was short on time, might improve the next weeks.

Good to hear your voice. Re on the "this is how PAUSE does it": I personally want to use Pinto to build my own repositories. For that I explicitely do not want any self-restrictions. I could always circumvent them anyway (eg. by starting from scratch with an empty repo, just because it very is not PAUSE but it's me who makes the rules). This in turn means it's just a matter of how hard to make it. This in turn leads me to: make it as easy as possible.

I can appreciate that. But I also want Pinto to support teams, so it needs to provide a reasonable degree of safety for them. Re on VCS: I'm not sure whether that use-case is too complicated.

I actually think it will work out ok. My implementation might suck, but I believe the use case is valid.

What I liked on my "add --force" proposal is its transparency, which an explicit "replace" does not provide - there I need additional complexity to decide whether to use 'add' or 'replace'. You still have to decide whether to just 'add' or 'add --force' The latter probably just seems more intuitive because that's how you first imagined the command. Imagine a repeated "dzil build ; pinto add" with same version everytime for a staging area: With replace vs. add you need to know whether you run it for the first time or not; while add --force always works. Personally, I think that's the wrong way to go anyway. Different builds should have different names -- period. D::Z has the infrastructure to handle that for you automatically. it even contained a "delete" which it just did not expose as explicit command, yet. Yes, that was helpful -- thank you. I'm not sure if I can use it, given the VCS-like behavior. But I'll definitely keep it handy. it is philosophically simple: delete and add again as if it had never existed. Risk is on the user, of course, which is ok for a parameter named --force. I think you should ship your own command as a plugin (maybe call it "clobber" or "swap"). I do believe that tools should try to fit your workflow (not the other way around). I'm just not convinced this particular tool should be part of the core suite.

renormalist commented 11 years ago

Jeffrey Ryan Thalhammer notifications@github.com writes:

Personally, I think that's the wrong way to go anyway. Different builds should have different names -- period. D::Z has the infrastructure to handle that for you automatically.

Almost true, but not for my use-case "staging area".

Complex example: our test infrastructure "Tapper" contains tweaks to work with hundreds of combinations of host+guest OS distributions on different machine architectures over a timespan of about 5 years. I can only find out whether a change works everywhere by running it through the actual system (a staging area), then continuously tweak it again and again. There you would easily create version jumps of 100+ untile a single line is the correct one.

Only once it works I release it to CPAN - with a version increment by 1.

Did you never do "dzil install" locally on your machine several times without counting the version up, just to see how it works?

Kind regards, Steffen

renormalist commented 11 years ago

Forgot a point:

Jeffrey Ryan Thalhammer notifications@github.com writes:

On Nov 7, 2012, at 2:59 PM, Steffen Schwigon wrote:

What I liked on my "add --force" proposal is its transparency, which an explicit "replace" does not provide - there I need additional complexity to decide whether to use 'add' or 'replace'. You still have to decide whether to just 'add' or 'add --force'

Can I use 'pinto replace' when the distribution does not yet exist?

thaljef commented 11 years ago

On Nov 8, 2012, at 1:03 AM, Steffen Schwigon wrote:

Can I use 'pinto replace' when the distribution does not yet exist?

I think I see where you're going here. You have (or want) an automated process that churns out lots of development releases. When a new development release comes out, you generally don't care about the previous development releases any more. You don't want the repository to accumulate 1000's of releases when only a handful of them really matter. And you want to pull each of those development releases to several stacks with minimum fuss.

I don't always practice what I preach, but I'm a huge advocate for continuous integration & deployment. So I'm glad you're doing it right (can I come work for you? :-). I'm starting to see the light here. Give me some time though. Most of my attention is on http://stratopan.com right now. Have you checked that out yet?

-Jeff

thaljef commented 11 years ago

For reference, this issue is similar to #16.

schwern commented 11 years ago

Here's another use case where the lack of a force (or the prohibition on adding the same tarball twice) has gotten in the way.

I had Net::Gen 1.0 from SPIDB/Net-ext-1.0.tar.gz in the repository. I discovered I needed an upgrade so...

$ pinto pull SPIDB/Net-ext-1.011.tar.gz
Invalid header block at offset unknown at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Dist/Metadata/Tar.pm line 45.
Invalid header block at offset unknown at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Dist/Metadata/Tar.pm line 45.
Invalid header block at offset unknown at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Dist/Metadata/Tar.pm line 45.
Invalid header block at offset unknown at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Dist/Metadata/Tar.pm line 45.
Invalid header block at offset unknown at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Dist/Metadata/Tar.pm line 45.
Invalid header block at offset unknown at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Dist/Metadata/Tar.pm line 45.
Downgrading package ANDYA/Test-Harness-3.22/Test::Harness~3.22 to SPIDB/Net-ext-1.011/Test::Harness~1.1602 in stack init

Whoops! Not sure what went wrong there, but something did. It replaced Test::Harness instead. That's wrong, so revert that...

$ pinto revert

And use replace to make it very clear that SPIDB/Net-ext-1.011.tar.gz replaces SPIDB/Net-ext-1.0.tar.gz

$ pinto replace Net-ext-1.011.tar.gz SPIDB/Net-ext-1.0.tar.gz
Archive Net-ext-1.011.tar.gz is identical to SPIDB/Net-ext-1.011.tar.gz
Archive Net-ext-1.011.tar.gz is identical to SPIDB/Net-ext-1.011.tar.gz at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Pinto.pm line 52.

Argh.

Sure, it's identical, but replace, add and pull do more than just copy a tarball. They update the index, too. That's what I'm after, updating the index. But there's no command to just update the index. There's no command to delete a release. And there's no command to force Pinto to go ahead and ignore that there's already an identical tarball. I'm stuck.

thaljef commented 11 years ago

On Nov 13, 2012, at 3:55 PM, Michael G. Schwern wrote:

I had Net::Gen 1.0 from SPIDB/Net-ext-1.0.tar.gz in the repository. I discovered I needed an upgrade so...

Downgrading package ANDYA/Test-Harness-3.22/Test::Harness~3.22 to SPIDB/Net-ext-1.011/Test::Harness~1.1602 in stack init Whoops! Not sure what went wrong there, but something did. It replaced Test::Harness instead.

It thinks that it contains Test::Harness. It might be an inc-luded package or a mock or mix-in. Or it might be what PAUSE would call "unauthorized". I've seen this elsewhere too. Definitely needs to be fixed.

That's wrong, so revert that...

$ pinto revert And use replace to make it very clear that SPIDB/Net-ext-1.011.tar.gz replaces SPIDB/Net-ext-1.0.tar.gz

$ pinto replace Net-ext-1.011.tar.gz SPIDB/Net-ext-1.0.tar.gz Archive Net-ext-1.011.tar.gz is identical to SPIDB/Net-ext-1.011.tar.gz Archive Net-ext-1.011.tar.gz is identical to SPIDB/Net-ext-1.011.tar.gz at /home/schwern/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Pinto.pm line 52. Argh.

Sure, it's identical, but replace, add and pull do more than just copy a tarball. They update the index, too. That's what I'm after, updating the index.

pinto pull SPIDB/Net-ext-1.011.tar.gz

If the target is already in the repository, then it just pulls it to the stack (i.e. updates its index -- no tarball is copied). Beware this doesn't necessarily eliminate all references to the old distribution. Just like with the CPAN index, if the outgoing distribution contains packages that aren't in the incoming distribution, those packages will still be in the index. Make sense?

I've thought about making it so that when any package is updated in a stack, all of its siblings automatically get removed so the stack will contain no references to the old distribution. But I'm not sure if this is the "correct" behavior.

thaljef commented 11 years ago

On Nov 13, 2012, at 3:55 PM, Michael G. Schwern wrote:

$ pinto pull SPIDB/Net-ext-1.011.tar.gz Downgrading package ANDYA/Test-Harness-3.22/Test::Harness~3.22 to SPIDB/Net-ext-1.011/Test::Harness~1.1602 in stack init Whoops! Not sure what went wrong there, but something did. It replaced Test::Harness instead. That's wrong, so revert that...

Net-ext-1.011.tar.gz does indeed include a version of Test::Harness in an xlib/ directory. Judging from the Makefile.PL, it wants to use this version of Test::Harness if the perl version is less than 5.004_05. Perhaps Test::Harness was broken in that release. Maybe it wasn't dual-life yet. I dunno.

I'm not sure what the right solution is here. We could make Pinto (i.e. Module::Metadata) smarter. But that is only worthwhile if it solves a fairly common problem. Try to compensate for a broken dist here and a broken dist there seems feels like a waste of time. In most cases (like this one) it may be easier to just fix the dist.

schwern commented 11 years ago

Maybe pinto should ignore xlib? I guess it's like xt?

pinto pull SPIDB/Net-ext-1.011.tar.gz is not a solution, that's what got me into trouble in the first place. Pinto will index it incorrectly. I figured replace might be smarter. It's also inconsistent that I can pull the same tarball but I can't add or replace the same tarball. For this particular issue it seems I have to rebuild the tarball anyway to remove the xlib directory, so that's my work around for now.

Part of the problem is that pinto revert is not a full revert. It only rolls back the index changes, but the tarball remains leaving me unable to reindex it. See #35.

In #35 I proposed a reindex command to get around this problem, and/or as a --reindex option to add and replace. pinto reindex PERSON/Some-Modulel-1.23.tar.gz would tell Pinto to delete the entries for that tarball in the repo and recreate them fresh. pinto add --reindex Some-Module-1.23.tar.gz would be just like pinto add Some-Module-1.23.tar.gz but if it detects a duplicate it acts like a reindex. This would be convenient when adding heaps of modules to an existing repository.

--reindex is the vital "JUST DO IT YOU STUPID MACHINE" for a specific refusal to proceed rather than a catch all like --force. I still think pinto needs a --force as well.

thaljef commented 11 years ago

On Nov 14, 2012, at 2:27 PM, Michael G. Schwern wrote:

Maybe pinto should ignore xlib? I guess it's like xt?

pinto pull SPIDB/Net-ext-1.011.tar.gz is not a solution, that's what got me into trouble in the first place. It's also inconsistent that I can pull the same tarball but I can't add or replace the same tarball.

Yes, pulling SPIDB/Net-ext-1.011.tar.gz won't help here because that distribution is still broken (i.e. Test::Harness in the xlib/ directory). But in general, pulling is the way to get packages that are in the repository (or an upstream repository) into a stack. Part of the problem is that pinto revert is not a full revert. It only rolls back the index changes, but the tarball remains leaving me unable to reindex it. See #35.

I say again: pull it to reindex. In #35 I proposed a reindex command to get around this problem, and/or as a --reindex option to add and replace. pinto reindex PERSON/Some-Modulel-1.23.tar.gz would tell Pinto to delete the entries for that tarball in the repo and recreate them fresh. pinto add --reindex Some-Module-1.23.tar.gz would be just like pinto add Some-Module-1.23.tar.gz but if it detects a duplicate it acts like a reindex. This would be convenient when adding heaps of modules to an existing repository.

I don't get it. If pinto tells you that you're trying to add a duplicate distribution, then that's your cue to just pull it. I see no reason why a repository should contain duplicate distributions.

And replace doesn't mean to replace a physical distribution archive. It just means to put a new archive in the repository and reindex it on the stacks where the old archive was indexed. That means your new archive can't have the same filename, because the old archive is still there.

--reindex is the vital "JUST DO IT YOU STUPID MACHINE" for a specific refusal to proceed rather than a catch all like --force. I still think pinto needs a --force as well.

Remember that pinto tries to be safe for large-scale development. You might have multiple teams and products building apps from the repository. So it should be hard to make irreversible changes to the repository. That's why I've resisted things like --force.

schwern commented 11 years ago

On 2012.11.14 5:19 PM, Jeffrey Ryan Thalhammer wrote:

I see no reason why a repository should contain duplicate distributions.

Nor I. I want to reindex the existing one. Re-adding the same tarball is a convenient way of identifying which tarball to reindex.

There wouldn't be a duplicate anyway, its the same file with the same checksum and the same name. Pinto can recopy it if it likes, or it can skip the copy. The bits remain the same.

I don't get it. If pinto tells you that you're trying to add a duplicate distribution, then that's your cue to just pull it.

Once again, pull was not an option because of the indexing error. Pull would also not be an option if we replaced the "add, revert, add again" scenario with a tarball on my disk and not in a repository. AFAIK pull is all about getting tarballs from CPAN and Pinto repositories. I guess I could fake up a repository, but that seems silly.

To me "pull" is "add but get the tarball from a repository". There's nothing in the pull command which suggests its allowed to replace a duplicate and add is not.

So the real question is why am I allowed to pull a duplicate but not allowed to add or replace one?

And replace doesn't mean to replace a physical distribution archive. It just means to put a new archive in the repository and reindex it on the stacks where the old archive was indexed. That means your new archive can't have the same filename, because the old archive is still there.

Since adding the new Net-ext got confused and overwrote Test::Harness instead I figured I'd revert and then tell it to specifically overwrite specifically the old version of Net-ext using replace. That seems what replace is there for.

--reindex is the vital "JUST DO IT YOU STUPID MACHINE" for a specific refusal to proceed rather than a catch all like --force. I still think pinto needs a --force as well.

Remember that pinto tries to be safe for large-scale development. You might have multiple teams and products building apps from the repository. So it should be hard to make irreversible changes to the repository. That's why I've resisted things like --force.

Easy things easy, hard things possible.

--force is the user's way of telling the designer "look, I know you mean the best, but I'm here in the field with all the knowledge about my problem and you're not".

It doesn't have to be a single "override everything" switch. --reindex is an example of that. It doesn't override all of Pinto's error checking. It doesn't ask Pinto to do something dangerous to the integrity of the repository. Adding the same tarball again isn't dangerous, its just possibly redundant. Except in my case it isn't redundant, I need to reindex it and re-adding is the most direct approach. This is what I mean when I say there are things the user knows which the designer cannot.

This is why some things in Perl are warnings and others are errors.

As an aside, since Pinto has a VCS nothing should be irreversible, even a force, but that's the VCS design discussion we're having on the side.

thaljef commented 11 years ago

On Nov 15, 2012, at 3:34 PM, Michael G. Schwern wrote:

AFAIK pull is all about getting tarballs from CPAN and Pinto repositories.

That's where you are mistaken. pull is about registering packages on a stack. If those packages aren't already in the repository, only then does it go looking for them on CPAN.

Right now you only have one stack, so you have the impression that every pull will go to CPAN. But once you have multiple stacks, you'll see that isn't true.

You also probably expect that every archive in the repository is also registered in the stack. In the end, that might be true, but it doesn't have to be. When you add-ed and then revert-ed, the packages were removed from the stack, but the archive is still in the repository.

To me "pull" is "add but get the tarball from a repository". There's nothing in the pull command which suggests its allowed to replace a duplicate and add is not.

"pull" is not replacing anything. If you pull the same thing twice, it just re-registers packages on the stack (i.e. reindexes).

So the real question is why am I allowed to pull a duplicate but not allowed to add or replace one?

I think I finally understand what you are getting at here. You see that "pull" can be idempotent, and you think "add" should be the same. So if throw the same tarball at Pinto twice, your intention is to just reindex the one that is already in the repository.

So that's what I've done for the next release. But there is one caveat -- the basename of the archives must also be the same. If you attempt to add a duplicate tarball with a different name, then you'll still get an error. This is to prevent you from merely changing the name of archive without actually changing the contents.

Since adding the new Net-ext got confused and overwrote Test::Harness instead I figured I'd revert and then tell it to specifically overwrite specifically the old version of Net-ext using replace. That seems what replace is there for.

No, replace means "add this archive to the repository and then register it on all the stacks where this other archive was registered". Once you reverted, the old Net-ext was no longer registered on any stacks. So when you did the replace, it put the new Net-ext into the repository, but there were no stacks where it should have registered it. At that point you could have just pulled SCHWERN/Net-ext-x.xx.tar.gz and it would register your version on the stack.

It doesn't have to be a single "override everything" switch. --reindex is an example of that. It doesn't override all of Pinto's error checking. It doesn't ask Pinto to do something dangerous to the integrity of the repository. Adding the same tarball again isn't dangerous, its just possibly redundant. Except in my case it isn't redundant, I need to reindex it and re-adding is the most direct approach.

Well, I think I've resolved your issues with add-ing duplicates, so --reindex isn't really necessary there. What it should mean to "replace" an archive is still unresolved though. @renomalist wants a true replacement of the physical archive. I prefer to just change the indexes and keep all the archives. What do you think?

This is why some things in Perl are warnings and others are errors.

I'm probably in the camp that would make a lot of warnings into errors. But that's another story.

As an aside, since Pinto has a VCS nothing should be irreversible, even a force, but that's the VCS design discussion we're having on the side.

That's true for now. I've almost convinced myself that the "history is immutable" approach is wrong. If we permit true removal of archives, then I want it to be sufficiently difficult (but not impossible) to do so.

thaljef commented 11 years ago

Here's what probably should have happened with Net-ext:


# Start a repository, pull in a Test::Harness
$> pinto -r TEST init
$> pinto -r TEST pull OVID/Test-Harness-3.25.tar.gz
$> pinto -r TEST ls -P Test::Harness
rf  Test::Harness                3.25  OVID/Test-Harness-3.25.tar.gz

# Now pull Net-ext-1.011
$> pinto -r TEST pull SPIDB/Net-ext-1.011.tar.gz
Downgrading package OVID/Test-Harness-3.25/Test::Harness~3.25 to SPIDB/Net-ext-1.011/Test::Harness~1.1602 in stack init

# Woops! We have the wrong Test::Harness now
$> pinto -r TEST ls -P Test::Harness
rf  Test::Harness                1.1602  SPIDB/Net-ext-1.011.tar.gz

# That isn't right, so go back
$> pinto -r TEST revert

# We have the right Test::Harness again
$> pinto -r TEST ls -P Test::Harness
rf  Test::Harness                 3.25  OVID/Test-Harness-3.25.tar.gz

# Now put our patched version in there (no xlib directory)
$> pinto -r TEST add Net-ext-PATCHED-1.011.tar.gz

# We still have the right Test::Harness
$> pinto -r TEST ls -P Test::Harness
rf  Test::Harness                  3.25  OVID/Test-Harness-3.25.tar.gz

# And we've got our patched Net-ext
$> pinto -r TEST ls -P Net
rl  Net::Gen                          1.011  jeff/Net-ext-PATCHED-1.011.tar.gz
rl  Net::Inet                           1.0  jeff/Net-ext-PATCHED-1.011.tar.gz
rl  Net::TCP                          1.0  jeff/Net-ext-PATCHED-1.011.tar.gz
rl  Net::TCP::Server                   1.0  jeff/Net-ext-PATCHED-1.011.tar.gz
rl  Net::UDP                          1.0  jeff/Net-ext-PATCHED-1.011.tar.gz
rl  Net::UNIX                           1.0  jeff/Net-ext-PATCHED-1.011.tar.gz
rl  Net::UNIX::Server                    1.0  jeff/Net-ext-PATCHED-1.011.tar.gz

renormalist commented 11 years ago

Thanks, Jeffrey for your endurance. Follow-up question:

Jeffrey Ryan Thalhammer notifications@github.com writes:

Here's what probably should have happened with Net-ext:


# Start a repository, pull in a Test::Harness
$> pinto -r TEST init
$> pinto -r TEST pull OVID/Test-Harness-3.25.tar.gz
$> pinto -r TEST ls -P Test::Harness
rf  Test::Harness                3.25  OVID/Test-Harness-3.25.tar.gz

# Now pull Net-ext-1.011
$> pinto -r TEST pull SPIDB/Net-ext-1.011.tar.gz
Downgrading package OVID/Test-Harness-3.25/Test::Harness~3.25 to SPIDB/Net-ext-1.011/Test::Harness~1.1602 in stack init

# Woops! We have the wrong Test::Harness now
$> pinto -r TEST ls -P Test::Harness
rf  Test::Harness                1.1602  SPIDB/Net-ext-1.011.tar.gz

# That isn't right, so go back
$> pinto -r TEST revert

How can I revert that single Net-ext-1.011 if I recognize its failure much later, after I also pulled in FOO, BAR, BAZ, and half of CPAN with their dependecies?

thaljef commented 11 years ago

How can I revert that single Net-ext-1.011 if I recognize its failure much later, after I also pulled in FOO, BAR, BAZ, and half of CPAN with their dependecies?

In this particular case, you don't need to revert. Instead, you would pull OVID/Test-Harness-3.25.tar.gz again. You probably still want Net-ext (you pulled it for a reason after all). You just want to "undo" the downgrade of Test::Harness.

So strictly speaking, you don't even have to patch Net-ext. You just have to correct the index afterwards. But patching it is still the right thing to do because it prevents this problem from happening again if you pull Net-ext to another stack in the future.

And all this assumes that FOO, BAR, and BAZ didn't already upgrade Test::Harness for you. Otherwise, you wouldn't even have the problem any more.

In the general case, however, I think you have a very good question and I'm not sure what the answer should be. It certainly seems reasonable to want to "undo" a specific revision. But it is not clear to me what Pinto should do in that case. If you undo a revision, you might be downgrading (or completely removing) a dependency that is required for some other module.

Pinto does know about the dependencies, so it could warn that you've just broken the dependency graph. But I doubt it could actually fix it for you, because only you know what was wrong in any given revision.

Another possibility is to refuse downgrades in certain situations. If a prerequisite causes a downgrade, then it probably means something has gone wrong. But if you deliberately downgrade by pulling an older version of a module, then that's probably ok.

That reminds me, Pinto needs to keep track of which modules you pulled/added explicitly and which ones came along as prerequisites. This was on Micheal's wishlist at one point. Explicitly pulled/added modules are basically the root nodes of the dependency graph. There might not be anything in the repository that depends on them, but YOUR app depends on them implicitly.

renormalist commented 11 years ago

Jeffrey Ryan Thalhammer notifications@github.com writes:

How can I revert that single Net-ext-1.011 if I recognize its failure much later, after I also pulled in FOO, BAR, BAZ, and half of CPAN with their dependecies?

In this particular case, you don't need to revert.

Of course, in this particular case of T::Harness downgrade. I'm curious about a general unknown issue. Something went wrong with Net-ext, I already have FOO, BAR, BAZ and only want to fix Net-ext, whatever it's problem was.

That' where I'm unsure how "revert" works. In git it is basically a reverse diff of a patch, that could have happened in the past, but in Pinto? Does the pull/add of Net-ext have something similar to the commitid which I revert? Or is it more comparable to a global database rollback before the add/pull of Net-ext?

Thanks. Steffen

thaljef commented 11 years ago

That's where I'm unsure how "revert" works. In git it is basically a reverse diff of a patch, that could have happened in the past, but in Pinto? Does the pull/add of Net-ext have something similar to the commitid which I revert? Or is it more comparable to a global database rollback before the add/pull of Net-ext?

It is more like git or svn. For each stack, Pinto knows two things: the current state of the stack (i.e. what the "head" is) and the diff between every prior revision of the stack. Each diff is identifiable by a sequence number (kinda like svn, but the number is local to the stack, not global to the repository).

When you revert, Pinto is just doing a series of reverse diff patches to the head, until it arrives at the revision you wanted. Pinto could also undo a specific revision (or a range of revisions) but I just haven't written that command yet because I'm less certain about the use case.

Thought Stream 1

Suppose you have a stack that contains packages A~1 and B~1 at revision 0. Then you decide upgrade to A~2, which has B~2 as a prerequisite. Now you have A~2 and B~2 at revision 1. After some time passes, you decide to add C~1 to the stack, which has B~3 as a prerequisite. Not you have A~2, B~3, and C~1 at revision 2.

At this point you realize that A~2 is broken, and you want to go back to A~1. So you undo revision 1. But what should Pinto do about B?

Thought Stream 2

It might be useful to classify the different scenarios for manicuring the stack. I imagine there are two major types. The first is when you discover that you upgraded or introduced a dependency that is broken or not compatible with your application. The second is when you discover that Pinto didn't index something the way your wanted it (either because Pinto is broken or the dist is broken). I suspect that these two scenarios might have very different resolution paths.

Thought Stream 3

Just because git/vcs can merge and revert source code doesn't mean it still actually works. You still have to decide what the "right" code is. The same is true for Pinto. You can pull, add, merge, and revert all you want and Pinto does not guarantee that the index is necessarily "correct" for your application. Pinto merely has to give you sufficient tools to make it correct (whatever that is) with a reasonable amount of effort. In other words, Pinto is not magic pixie dust.

thaljef commented 10 years ago

That reminds me, Pinto needs to keep track of which modules you pulled/added explicitly and which ones came along as prerequisites. This was on Micheal's wishlist at one point. Explicitly pulled/added modules are basically the root nodes of the dependency graph. There might not be anything in the repository that depends on them, but YOUR app depends on them implicitly.

I've concluded that trying to mark a distribution as either "explicitly requested" or as "just a prerequisite" isn't feasible because it can easily switch from one state to the other based on the other stuff in the stack.

So instead, there is a roots command that lists all the distributions you would need to install to get everything in the stack. In other words, all the distributions that are not prerequisites for some other distribution in the stack.

I think that effectively solves the problem of knowing what you need to install from your stack. (Although personally, I think declaring your application's prerequisites in a META or cpanfile is a better approach).