src-d / hercules

Gaining advanced insights from Git repository history.
Other
2.63k stars 334 forks source link

Failure to aggregate developers when using .mailmap or --people-dict #355

Open Code0x58 opened 4 years ago

Code0x58 commented 4 years ago

When using just a committed .mailmap the aggregation is done nicely, but when --people-dict=... is added the mailmap is not used at all. This is due to the branching in internal/plumbing/identity/identity.go and lack of handling of mailmap in the IdentityDetector.LoadPeopleDict(...).

Another issue is that if you do use input to --people-dict which has multiple emails mapping to the same name, those results will not be aggregated. I believe this is down to this method which doesn't check for duplicate ids[0] before appending. A workaround (or intended but not clearly documented way) for this is to roll any duplicates into the same line e.g.

Real Name|first@example.com|second@example.com

To work around the mailmap not being used, you can convert your mailmap into a --people-dict and pass that.

vmarkovtsev commented 4 years ago

Thanks for digging this @Code0x58! Yes, merging several signatures with | is the intended way of how --people-dict works. Deduplicating the file is something that can be done beforehand with a variety of existing tools, so this is a matter of documenting the contract. Likewise, --people-dict is considered the ultimate source of truth so any mailmap is skipped.

It would be awesome if you PR your findings to the README :+1: Besides, I am happy to include the script that you crafted to convert .mailmap to --people-dict!

Code0x58 commented 4 years ago

Now I'm familiar, I can see I misinterpreted the documentation about being able to have multiple emails mapping back:

If --people-dict is specified, it should point to a text file with the custom identities. The format is: every line is a single developer, it contains all the matching emails and names separated by |.

So in #356 I extended the README.md a little and made it so you don't get any surprises if you do:

Duplicate|first@example.com
Duplicate|second@example.com

As for converting a mailmap to a --people-dict: that was done by hand. I can see what you mean about it being the source of truth so just made it clear in the readme that you get one xor the other.

anatolix commented 2 years ago

Hi. Looks like there is an error in this code:

reverseDict = append(reverseDict, canon)
size++
canonIndex = size

This gives an index shift by 1 between dict and reverseDict, and panic in case last person in identities.txt has a commit

This should fix an issue

reverseDict = append(reverseDict, canon)
canonIndex = size
size++