Closed pulkomandy closed 4 years ago
Hi, Sorry, I noticed the regression a day after the release. Yesterday, I released a patch-version which fixes the issue. Unfortunately, the blame-data fetch is indeed very slow. I found that multitheaded fetch accelerates it twice. Which, of course, not a cure for repos with lots of files. In the mean time, the progress indication is a bit complicated in multitheaded case, so for the sake of simplicity I removed it.
On the other hand I was thinking of deprecating the --contribution
option in v2.3.x as the most interesting metrics is obtained from blame data. Apparently, I have to reconsider this deprecation...
I would appreciate if you wait till the end though and tell me how long did it take :)
Yes, I'm not too much in a hurry for the stats, and that's running on my server machine so I won't power it off. I'd say this is acceptable if the results can be somehow cached so that it doesn't need to be recomputed everytime I run repostat. Otherwise I'll probably keep it disabled for such large repos.
Well it ran for several days and apparently eventually the machine ran out of memory and killed it. The out of memory may be for other reasons, the machine is somewhat busy with other things and has just enough RAM for all of it and a rather small swap partition.
Hi, thanks for the update. I am now curious to run repostat on your repo :)
Pandas is known for being greedy for RAM, perhaps it is worth to think about dataframes deletions when those are not needed anymore
There is a very nice module: https://github.com/tqdm/tqdm which does what is needed.
Started repostat on haiku repo.
History data fetch was comparably fast: ~15 mins
So far not bad for blame data:
10%|▉ | 2501/25696 [19:56<2:54:03, 2.22it/s]
Apparently, for some files in the haiku repo, blame works extremelly slowly:
filename | hunks count | time to blame (s)
[...]
src/apps/drivesetup/DiskView.cpp 222 24.87
src/apps/drivesetup/DiskView.h 17 20.56
src/apps/drivesetup/DriveSetup.cpp 43 362.5
src/apps/drivesetup/DriveSetup.h 10 356.8
src/apps/drivesetup/DriveSetup.rdef 9 251.5
[...]
@pulkomandy , here is the time which was required to finish blame data collection on my laptop (4 cores, 8 GB RAM) for haiku repo: 100%|██████████| 25696/25696 [39:41:52<00:00, 5.56s/it]
I find again and again that the choice of pygit2 as a tool for git data processing was not ideal. This time because of https://github.com/libgit2/libgit2/issues/3027
I ran the new version of repostat on https://git.haiku-os.org/haiku/ (well, I used version 2.1.1, because 2.1.2 was not yet released when I started generating the stats :) )
It has been computing blame data for 22 hours now and is still running. It apparently runs 7 threads on my old 2 core machine and uses all available CPU. There is no progress indication so I don't know how long it will run.