thaljef / Pinto

Curate your own repository of Perl modules
https://metacpan.org/module/Pinto::Manual
66 stars 49 forks source link

Option to display diff even without --dry-run #208

Open fgabolde opened 9 years ago

fgabolde commented 9 years ago

Currently, commitable ops display the diff only if --dry-run is set, e.g.

pinto update -vv --dry-run Moose
# [verbose output...]
+[rf-] ILMARI/Devel-OverloadInfo-0.002.tar.gz
-[rf-] DROLSKY/Devel-StackTrace-1.27.tar.gz
+[rf-] DROLSKY/Devel-StackTrace-2.00.tar.gz
-[rf-] ZEFRAM/Module-Runtime-0.013.tar.gz
+[rf-] ZEFRAM/Module-Runtime-0.014.tar.gz
+[rf-] ETHER/Module-Runtime-Conflicts-0.002.tar.gz
-[rf-] DOY/Moose-2.0402.tar.gz
+[rf-] ETHER/Moose-2.1404.tar.gz
+[rf-] ETHER/Test-CleanNamespaces-0.18.tar.gz
-[rf-] ETHER/Test-Warnings-0.013.tar.gz
+[rf-] ETHER/Test-Warnings-0.021.tar.gz

I'm using Pinto from within Jenkins, to automate "promoting" distributions from one Pinto to another (e.g. from preproduction to production). I'd like my build logs to include the changes in all cases, not just when doing dry-run ops.

As a workaround, I'm running pinto update first with --dry-run, then again without.

thaljef commented 9 years ago

Running two commands doesn't seem unreasonable to me. It might look more natural if you did pinto diff after the update instead of using --dry-run before. However, there's no way to express something like diff HEAD^ as you would in git. So maybe that's the thing we really need here.

Does git have a similar feature (i.e. show a diff after commit)?

fgabolde commented 9 years ago

Running two commands doesn't seem unreasonable to me. It might look more natural if you did pinto diff after the update instead of using --dry-run before. However, there's no way to express something like diff HEAD^ as you would in git. So maybe that's the thing we really need here.

Yes, this feels more natural than my original proposition.

I was actually thinking, maybe the easiest way to do this is just

$ pinto log -d --latest

which would also go some way towards reducing my main gripe with pinto log -d, which is that it takes an unholy amount of time to display the results when talking to a remote Pinto server with a lot of changes, even if I am only interested in what my coworkers did last week while I was on holiday.

Does git have a similar feature (i.e. show a diff after commit)?

I was going to jokingly reply "even git doesn't have ALL the features", but actually it does... I just checked, and git help commit says that with --verbose, the message template includes the diff.

But anyway, the workflows with git and Pinto are different in this respect. With git, you prepare a commit by adding exactly the changes you want -- it doesn't add stuff by itself. With Pinto, you have your desired end state ("update Acme::Foobar") and it calculates what needs to be done to get there. There needs to be a simple way to ask it "what just happened?"

thaljef commented 9 years ago

$ pinto log -d --latest

Yes, that would work nicely too. I would still want to implement that in a git-like way. So the argument could be a stack name, or a commit id, or some symbolic thing like HEAD and it would just DWIM.

I just checked, and git help commit says that with --verbose, the message template includes the diff.

And that's exactly what pinto does. But you're doing this non-interactively so you never see the template message. I think that's the key difference in this situation.

thaljef commented 9 years ago

it takes an unholy amount of time to display the results when talking to a remote Pinto server with a lot of changes, even if I am only interested in what my coworkers did last week while I was on holiday.

Since about version 0.085 pinto streams output to the client, so you should get a pretty quick response. Which version do you have?

NB: pinto --version will tell you what version of the client is installed locally, but you'll have to login to the server and look at the pintod code to know which version the server is.

fgabolde commented 9 years ago

The client is 0.09996 and the server is 0.09995.

These are large repositories with very frequent changes though:

STATISTICS FOR THE "master" STACK
-------------------------------------

                     Stack      Total
               ----------------------
     Packages       12543      175983
Distributions         969        3493

The output of pinto log -d on this repository was around 629K two months ago, just to give you an idea. With the very scientific method of "counting aloud", I measured ~50s before anything would appear in pinto log -d | less (only one screenful -- the next screenful took 50 more seconds, so this is not just $|).

On one of our smaller repositories, which has 65 revisions currently, it still takes ~5s to start displaying stuff, ~17s if piped to a pager -- so basically all the time, and 20.5s to finish (measured by time).

thaljef commented 9 years ago

Depending on your machine, most pinto command have a startup time of 2-5 seconds due to the overhead of Moose. The overhead is slightly less when using a remote repository because the server has already loaded modules that would have to be loaded by the client if you were using a local repository. I would like to covert everything to Moo, but that's not a high priority right now (any takers?).

Having said all that, I turned on autoflush in 35342ee1 and I saw a pretty big improvement for both local and remote repositories. You're welcome to try it out yourself, or just wait for the next release (maybe next week). To get the speedup with a remote repository, you'll have to upgrade both server and client. This probably won't improve your time measurements, but I think it will reduce the perceived latency.

tartansandal commented 9 years ago

Would really need to get NTYprof or similar running on that to see what the real bottleneck is, however, my guess is this has something to do with Pinto using both SQLite and an ORM (DBIx::Class). I'm thinking we might be getting sub-optimal SQL queries (via prefetch and the like) and SQLite is not smart enough to produce a sufficiently good query plan. If this is true, we might want to explore plugging in a real database like PostgreSQL or crafting explicit joins so that DBIx::Class produces more optimal SQL.

Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd GitHub: @tartansandal

Suite 1416 401 Docklands Drive Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925

On 14 May 2015 at 11:17, Jeffrey Ryan Thalhammer notifications@github.com wrote:

Depending on your machine, most pinto command have a startup time of 2-5 seconds due to the overhead of Moose. The overhead is slightly less when using a remote repository because the server has already loaded modules that would have to be loaded by the client if you were using a local repository. I would like to covert everything to Moo, but that's not a high priority right now (any takers?).

Having said all that, I turned on autoflush in 35342ee https://github.com/thaljef/Pinto/commit/35342ee1bf2109a54b93429bcc2f0fba02e199bc and I saw a pretty big improvement for both local and remote repositories. You're welcome to try it out yourself, or just wait for the next release (maybe next week). To get the speedup with a remote repository, you'll have to upgrade both server and client. This probably won't improve your time measurements, but I think it will reduce the perceived latency.

— Reply to this email directly or view it on GitHub https://github.com/thaljef/Pinto/issues/208#issuecomment-101873134.

fgabolde commented 9 years ago

I turned on autoflush in 35342ee and I saw a pretty big improvement for both local and remote repositories. You're welcome to try it out yourself, or just wait for the next release (maybe next week).

Neat!

I'd like to try it now, but I suspect my colleagues don't want me cowboy-patching the Pinto server, and I don't have a huge Pinto repository of my own to test on. I'll wait for the release.

On the topic of git-like revsets, it sounds pretty cool and it will definitely help. I can work on it if you like.

thaljef commented 9 years ago

On the topic of git-like revsets, it sounds pretty cool and it will definitely help. I can work on it if you like.

That would be much appreciated.

thaljef commented 9 years ago

I'll wait for the release.

@fgabolde Pinto-0.09998 has shipped.

fgabolde commented 9 years ago

@fgabolde Pinto-0.09998 has shipped.

Thanks @thaljef, I'll pester IT to set it up :)

I've started work on the rev-list feature, I'll update you as soon as I've got something approaching usable.

thaljef commented 9 years ago

@fgabolde so did this work out for you?

Is it fast enough now to just use pinto log -d?

fgabolde commented 9 years ago

@thaljef It's faster but it's still pretty slow :/ It got a little better for a while but since then the repository size has kept increasing, and so has the running time of pinto log.

I came across this as well while trying to add options to pinto log, but since you're using sqlite to store the revision ancestry as well, there is no really fast way to build the full history tree (your only option is to run one query for every ancestry level -- the parents method in the Revision result source). I couldn't get it to be fast without extracting ancestry info and dumping it into a different structure on file. That would potentially cause other problems since we would need to keep it synced with the DB...

thaljef commented 9 years ago

Well, rendering all log entries is always going to take more time as the history gets longer. But the time to render the first N log entries should be constant, and I thought that was the primary issue here. Is that not the case?

Yeah, the data structure pretty much requires a query for each revision. I wonder if it would be any faster to just fetch all revisions (or some large chunk of them) and then build the tree in memory.

fgabolde commented 9 years ago

Well, rendering all log entries is always going to take more time as the history gets longer. But the time to render the first N log entries should be constant, and I thought that was the primary issue here. Is that not the case?

If you check back to the first comment on this ticket, the issue was originally that I wanted to show dep diffs in Jenkins build logs, and I was using pinto update --dry-run for this, avoiding pinto log -d because it was slow (more than doubling build times in extreme cases, actually). Then the discussion turned to adding features to pinto log -d to limit its output.

Regarding the current speed of pinto log, it's fine... if someone is in front of the machine to press C-c once enough output has appeared. So at least one of the use cases ("what have the guys been doing last week") works now.

Yeah, the data structure pretty much requires a query for each revision. I wonder if it would be any faster to just fetch all revisions (or some large chunk of them) and then build the tree in memory.

Rather than all revisions, we only need the ancestry table and one starting revision for building the tree (fortunately the ancestry table is likely to be much smaller, with fewer columns, all ints). From there we can fetch the revisions in batches.