hub4j / github-api

Java API for GitHub
https://github-api.kohsuke.org/
MIT License
1.12k stars 715 forks source link

The getAuthor method of the GHCommit class returns null even though the author exists. #1851

Closed NikRam822 closed 3 weeks ago

NikRam822 commented 1 month ago

Describe the bug I use the getAuthor() method of the GHCommit class and get null, although the author of the commit is known. By querying https://api.github.com/repos/owner/repo/git/commits/commitSHA1 I get information about the author of the view as well:

"author": {
        "name": "name",
        "email": "name@mail.com",
        "date": "2024-03-24T02:00:08Z"
    },

To Reproduce Steps to reproduce the behavior: An example for my repository.

  1. Go to https://api.github.com/repos/NikRam822/GoOpenSource/git/commits/aa0bcf98d8a3a0e4eb7597720a87821c16537e9f
  2. As we can see, information about the author is known
{
  "sha": "aa0bcf98d8a3a0e4eb7597720a87821c16537e9f",
  "node_id": "C_kwDOLjW5AtoAKGFhMGJjZjk4ZDhhM2EwZTRlYjc1OTc3MjBhODc4MjFjMTY1MzdlOWY",
  "url": "https://api.github.com/repos/NikRam822/GoOpenSource/git/commits/aa0bcf98d8a3a0e4eb7597720a87821c16537e9f",
  "html_url": "https://github.com/NikRam822/GoOpenSource/commit/aa0bcf98d8a3a0e4eb7597720a87821c16537e9f",
  "author": {
    "name": "Timur Yakhshigulov",
    "email": "tyakigulov@mail.com",
    "date": "2024-03-23T14:38:16Z"
  },
  "committer": {
    "name": "Timur Yakhshigulov",
    "email": "tyakigulov@mail.com",
    "date": "2024-03-23T14:38:16Z"
  },
  ...
  1. When I use the getAuthor() method for this commit, I get null. Here is my code:
    for (GHCommit commit : commitsPerPeriod) {
            String author = commit.getAuthor() != null ? commit.getAuthor().getName() : "no name";
            if (commit.getAuthor() == null) {
                System.out.println(commit.getSHA1());
            }
           ...

Expected behavior I expect to receive a GHUser object filled with author data.

Desktop :

Additional context

  1. I think the problem is in the GHCommit file in the resolveUser() method.
private GHUser resolveUser(User author) throws IOException {
        return author != null && author.login != null ? this.owner.root().getUser(author.login) : null;
    }
  1. I get the same problem on the following commits for my repository https://github.com/NikRam822/GoOpenSource :
aa0bcf98d8a3a0e4eb7597720a87821c16537e9f
de5831c7ff3b9a1670957593bfc17f9e10538777
5aef96f80733fd0d7b269ad30a4915b87255210a
4ff9de21192a4e4cd6b32ea2c081b384a89a40e0
6d1398c0ca542f0842c333610de2010d53ab026a
gsmet commented 1 month ago

Have you tried to do something like GHCommit.getCommitShortInfo().getAuthor()?

Things are a bit messy on the GitHub REST API side and there are actually two authors, one at the root that is sometimes not resolved (maybe it's only resolved when there is a clear mapping with a user, I don't know), and the other that is raw from the commit info (the one I point above).

NikRam822 commented 1 month ago

Yes, I tried using GHCommit.getCommitShortInfo().getAuthor(), but I noticed a strange behavior.

I'm developing a system to analyze GitHub projects (project link: https://github.com/NikRam822/TatMobileAnalyzer ). I'm reading all the commits of a GitHub project, pulling out the authors of the commits and collecting some statistics on each author. When using GHCommit.getCommitShortInfo().getAuthor(), some commits had a different author name than what is listed in the contributors section in GitHub.

For example: When analyzing this project: https://github.com/NikRam822/TatMobileAnalyzer using GHCommit.getCommitShortInfo().getAuthor(), I get the following authors:

  1. Olmecsandr
  2. Denis Nikolskiy
  3. Timur Akhmatov
  4. Amirka-Kh
  5. Amir
  6. Nikita Ramzin

Author 4 Amirka-Kh and 5 Amir are actually 1 author. The counterparts section in GitHub (https://github.com/NikRam822/TatMobileAnalyzer/graphs/contributors) has only these authors:

  1. Olmecsandr
  2. Denis Nikolskiy
  3. Timur Akhmatov
  4. Amirka-Kh
  5. Nikita Ramzin

When using the getAuthor() method of the GHCommit class for this project, I get these authors:

  1. Olmecsandr
  2. Denis Nikolskiy
  3. Timur Akhmatov
  4. Nikita Ramzin
  5. Amir

These authors are correct and there are no duplicate authors, but when using GHCommit.getCommitShortInfo().getAuthor() "duplicates" appear.

gsmet commented 1 month ago

Yeah, that’s expected. Short info provides you with the raw information in the Git repository. This information comes from local settings, not from GitHub. So people working on several machines might not always commit with the same information. Also people doing changes directly on GitHub.

Anyway, I don’t think there will be a silver bullet here, unfortunately.

NikRam822 commented 1 month ago

Thanks, that makes more sense.

One last question before closing this issue) So there's no clear way to get the author block out of a commit?

 "author": {
        "name": "name",
        "email": "name@mail.com",
        "date": "2024-03-24T02:00:08Z"
    },
bitwiseman commented 4 weeks ago

@NikRam822
As @gsmet said, GHCommit.getCommitShortInfo().getAuthor() is your best bet.

Interesting note: This endpoint is where GHCommit is pulled from: https://api.github.com/repos/NikRam822/GoOpenSource/commits/aa0bcf98d8a3a0e4eb7597720a87821c16537e9f

This endpoint is where GitCommit` is pulled from: https://api.github.com/repos/NikRam822/GoOpenSource/git/commits/aa0bcf98d8a3a0e4eb7597720a87821c16537e9f

But we don't currently support a getting GitCommit separate from some parent like GHCommit.