src-d / hercules

Gaining advanced insights from Git repository history.
Other
2.63k stars 334 forks source link

No output from hercules #277

Closed fedorovsergey closed 5 years ago

fedorovsergey commented 5 years ago

Hello! I am trying to burndown very large repo (/.git is 3.3 Gb). Execute docker run -v <my/repo/path>:/repo srcd/hercules hercules --burndown --burndown-people --devs --print-actions --first-parent --pb /repo > result.pb Getting output git log... С 1 c81d1aab84e48b3dd9dfcc5e013d10d719739fe3 С 1 fa6e952b5d9a768b4a2ff69fa4fdfdee31aadf47 and so on. Scripts simply stop after a while on some commit with no message. I've tried to find something in logs, but without result. Several smaller ones repos are burned well. Is there a way to make process more verbose, or enable logging or smth. Thanks!

vmarkovtsev commented 5 years ago

If you check your CPU usage during the "hang", you are likely to see 200%. The analysed commit most likely contains a huge number of removed and added files, and poor Hercules is trying to detect fuzzy renames. Even though there are many clever tricks to speed up the matching, it is still fundamentally O(n²) operation. There is another possibility: there are huge binary files in that commit, and they are also matched with a bsdiff-like algorithm.

In any case, it does not stop. It just takes much time to digest. I hope not days, but it really depends on the commit.

vmarkovtsev commented 5 years ago

I should add a switch to fully disable renames detection on huge commits I think.

Zebradil commented 5 years ago

Well, I have the same issue and the problem is that the process finishes at some moment, but the output file is empty. I'm aware that the process slows down at some moments, but it seems like it's another case.

vmarkovtsev commented 5 years ago

@Zebradil @fedorovsergey I need to reproduce it, there are too many things which can go wrong. You can have me signed an NDA if you are really determined, otherwise I need an open source example.

fedorovsergey commented 5 years ago

I followed the advice to increase memory in virtualbox docker machine. 4gb solved my problem, output file created. But some error message would be great in this case. Thanks

vmarkovtsev commented 5 years ago

It turns out that catching OOM in Go is impossible.