Closed p3k closed 4 years ago
Well, that was backed with a benchmark on hundreds of OS repos from GitHub back in 2017. Since then, we tested with roughly 300k most-starred repos to great success. Perhaps your repo is an outlier that contains nasty edge cases that Hercules does not handle well by default (like adding and removing thousands of files in one commit). I can add to the claim smth like "... does but much faster except for one proprietary monster though nobody can verify " 😂
Now seriously: I can try to help you with identifying the particular bottleneck in Hercules and mitigating it. Please run it with --profile
on a subset of --commits
that is not painful to wait for.
thanks for the reply, going to run the command with the flags presumably tomorrow.
in the meantime i would be curious about verifying the benchmark results of the hundreds of OS repos from GitHub as well as the 300k most-starred repos. could you provide a link?
btw. after approx. 2 hours running the hercules
command finished. now i am running labours
and all i get is the message Reading the input...
and a blinking cursor – am i doing this right?
Regarding the links: this was internal to source{d} that is dead nowadays. To be precise, we ran Hercules over PGA. It was very painful and took months, I must say.
Regarding labours
, if you did not specify --pb
, it must be trying to parse a huge YAML. Depending on whether you are running in docker or not, PyYAML is probably defaulting to the pure Python parser, and that guy is really slow. If it was with --pb
then indeed there is much data.
BTW 2 hours is a big success if your repo is huge. It takes no less than 6 hours for Tensorflow, which we consider moderately sized. How many commits?
Regarding the links: this was internal to source{d} that is dead nowadays. To be precise, we ran Hercules over PGA. It was very painful and took months, I must say.
oh too bad, but understandably nonetheless.
Regarding
labours
, if you did not specify--pb
, it must be trying to parse a huge YAML. Depending on whether you are running in docker or not, PyYAML is probably defaulting to the pure Python parser, and that guy is really slow. If it was with--pb
then indeed there is much data.
ah ok learning here. is labours -f pb
what you mean? (--pb
seems to be recognized by hercules only.) should i also run hercules --pb
then?
BTW 2 hours is a big success if your repo is huge. It takes no less than 6 hours for Tensorflow, which we consider moderately sized.
oh ok i see. did you try git-of-theseus with tensorflow? :joy_cat:
How many commits?
21154
still not sure i am doing this right… the readme says to issue these two commands for the project burndown:
hercules --burndown
labours -m burndown-project
could it be i need to either combine both commands via pipe or temporarily save the hercules output, resp.?
The recommended flow is:
hercules --pb >results.pb
labours -i results.pb
Ofc you can pipe hercules --pb | labours
but as soon as you want to try different plotting parameters you'll have to wait another 2 hours. If the repo was small, any way would be OK, even with YAML.
Now that you've got YAML, there is no converter to PB, so either re-run hercules
with --pb
or give Python some time.
Yeah, I need to run theseus on Tensorflow, a good idea.
2 hours for 20k look normal. There are nasty repos from NDA clients which take days if the cmdline arguments are not tuned.
ok so i now got some nice burndown and ownership charts, thanks for the assistance.
regarding the latter, is it true i have to rerun hercules whenever i change the people dictionary? wouldn’t it be more efficient (if at all possible) to apply those entries when running labours?
is it true i have to rerun hercules whenever i change the people dictionary
There is --exact-signatures
but the support for merging them according to a specific identity dictionary is not implemented in labours
yet. PRs welcome :smile:
I am happy that I could help :+1: Shall I close?
gonna do that for you :smile_cat:
In the README it says:
This is a claim I cannot support from my experience using both, git-of-theseus and hercules, with the same repo (granted which is huge).
If you need numbers, I’ll provide them. Just let me know which ones you are interested in.