mntnr / name-your-contributors

Name your GitHub contributors; get commits, issues, and comments
MIT License
73 stars 20 forks source link

Cli updates #43

Closed tgetgood closed 6 years ago

tgetgood commented 7 years ago

You can now enter no args at the command line (except a token), in which case the current repo is queried, or just specify a repo in which case it is assumed to be yours.

I've disabled the isolated --user arg because with all of the new info being queried, this exceeds the rate limit when querying myself. I'm not especially profuse, so it's likely that this will just fail for most users.

The logic is still there if we can optimise the queries or think of another way to do it.

Addresses #26

RichardLitt commented 7 years ago

this exceeds the rate limit when querying myself

There's a ratelimit for GraphQL? Does that make sense? I thought that it would alleviate our issues, as there is only one hit.

tgetgood commented 7 years ago

It's a credit system, not technically a rate limit. See https://developer.github.com/v4/guides/resource-limitations/

Furthermore, each query is limited to a maximum possible return size of half a million nodes. So if we're retrieving reactions on comments on reviews on pullrequests on repositories then n^5 < 500k, or n=13 is the maximum number of each we can get at once, so we have to make a whackload of nextpage requests to get the rest.

It is far better than the v3 api. We're making tens of requests, not thousands, but each request can cost 50 credits or more, so we get fewer. There's room to optimise the queries, I'm sure. If we had statistics on the distributions of issues, comments, reactions, etc. we could be really smart about it but that would be a lot of work. Probably not worth it.

tgetgood commented 7 years ago

Regarding the limit making sense, the way graphql works, all of the query logic runs on their application servers, and without caching graphql can be massively inefficient. So if we're grabbing tons of stuff that no one else has recently it can get really expensive for GitHub. If there was no rate limit and no node limit we could potentially take them down with a single query (give me everything!). Probably not GitHub, but a smaller player definitely.

RichardLitt commented 7 years ago

Got an error in this repo:

09:59 ~/src/name-your-contributors (cli-updates) 🐕  node src/cli.js
TypeError: Cannot read property 'refs' of null
    at Object.cleanRepo (/Users/richard/src/name-your-contributors/src/queries.js:218:16)
    at graphql.executequery.then.json (/Users/richard/src/name-your-contributors/src/index.js:64:29)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:169:7)

Expected?

tgetgood commented 7 years ago

It is not. I'll look into it.

tgetgood commented 7 years ago

Do you remember what the cli args were when you got that error? I think it's only breaking on repos with more than 100 branches. I don't know of any of those off hand.

RichardLitt commented 7 years ago

I'm not using any - see the command above. I was assuming it would query RichardLitt/name-your-contributors, because of this;

You can now enter no args at the command line (except a token), in which case the current repo is queried, or just specify a repo in which case it is assumed to be yours.

tgetgood commented 7 years ago

Oh, right. Sorry I wasn't looking closely enough. I can replicate it now, but not in this repo, only others. Weird.

RichardLitt commented 7 years ago

Sounds like a bug, then? Weird.

At Névé if you're around.

tgetgood commented 7 years ago

Wasn't chopping the .git off the end of urls. My local one didn't have it. Fixed.

As far as I can test, this is working now. -u by itself has been excised based on our conversation the other day.

For rate limits, you can now check the cost of queries with -v, --verbose. Automatic queries for pagination are logged thus automatically. That may need to change, but it's very convenient for dev.

RichardLitt commented 7 years ago

As far as I can test, this is working now. -u by itself has been excised based on our conversation the other day.

Does -o still work? I have a use case for this.

tgetgood commented 7 years ago

-o does still work. Though if an org has lots of repos it will consume your quota rather quickly.

RichardLitt commented 7 years ago

Still not working.

17:07 ~/src/name-your-contributors (cli-updates) * 🐕  node src/cli.js
TypeError: Cannot read property 'refs' of null
    at Object.cleanRepo (/Users/richard/src/name-your-contributors/src/queries.js:226:16)
    at graphql.executequery.then.json (/Users/richard/src/name-your-contributors/src/index.js:81:26)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:169:7)
17:08 ~/src/name-your-contributors (cli-updates) * 🐕  git remote -vv
origin  git@github.com:RichardLitt/name-your-contributors.git (fetch)
origin  git@github.com:RichardLitt/name-your-contributors.git (push)
tgetgood    git@github.com:tgetgood/name-your-contributors.git (fetch)
tgetgood    git@github.com:tgetgood/name-your-contributors.git (push)
RichardLitt commented 7 years ago

Maybe use a module to get remote? https://github.com/sindresorhus/git-remote-origin-url

tgetgood commented 7 years ago

Using a module would be a good idea; I'm sure there are cases I don't know about. That said git-remote-origin-url just does the same thing I'm doing.

In any case the problem here is with my regexp. If I run it on your example it works, so the only think I can think of is whitespace. I'll give that a shot.

RichardLitt commented 7 years ago

In any case the problem here is with my regexp. If I run it on your example it works, so the only think I can think of is whitespace. I'll give that a shot.

Really, I would use git-remote-origin-url. I've run across a lot of little bugs with this issue before, and using a solution that handles them saves a lot of time in the long run, and makes it easier to debug.

RichardLitt commented 7 years ago

Although, looking at it, wow, that needs some more tests.

tgetgood commented 7 years ago

For now I just added a test in for the regexp.

There are multiple opinions to be had here. Shelling out to a git command has the benefit that we know exactly what to expect based on git's documentation. We don't have to worry about three separate libraries that might make different guarantees. Less room for impedance mismatches. On the other hand when git changes its API as they are wont to do, it's on us to keep up.

Nothing for free. But I'd just as soon not bring in dependencies unless they do something we don't. Parsing files that git already has a parser for doesn't count.

RichardLitt commented 6 years ago

Trying this again, using the example in the --help text. Can you help me out?

> node src/cli.js -r ipfs -u ipfs --after=2016-01-15T00:20:24Z --before=2016-01-20T00:20:24Z
Cost of[prs cont]: {"cost":53,"remaining":4076}
Cost of[issues cont]: {"cost":52,"remaining":4024}
Cost of[issues cont]: {"cost":52,"remaining":3972}
Cost of[issue comments cont]: {"cost":1,"remaining":3971}
Error: Graphql error: {
  "data": null,
  "errors": [
    {
      "message": "Field 'reactions' doesn't exist on type 'ReactionConnection'",
      "locations": [
        {
          "line": 5,
          "column": 1
        }
      ]
    }
  ]
}
    at IncomingMessage.res.on (/Users/richard/src/mntnr/name-your-contributors/src/graphql.js:138:22)
    at emitNone (events.js:110:20)
    at IncomingMessage.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1047:12)
    at _combinedTickCallback (internal/process/next_tick.js:102:11)
    at process._tickCallback (internal/process/next_tick.js:161:9)ersions/node/v8.1.2/lib/node_modules/name-your-contributors/src/queries.js:312:38)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:169:7)
tgetgood commented 6 years ago

The recursion query was wrong. That should be fixed. Any more issues?

RichardLitt commented 6 years ago

Another refs null problem.

node src/cli.js -r mntnr -u name-your-contributors --after=2017-06-15T00:20:24Z --before=2017-11-20T00:20:24Z
TypeError: Cannot read property 'refs' of null
    at Object.cleanRepo (/Users/richard/src/mntnr/name-your-contributors/src/queries.js:236:16)
    at graphql.executequery.then.json (/Users/richard/src/mntnr/name-your-contributors/src/index.js:83:26)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:169:7)
tgetgood commented 6 years ago

That's a new one. I think you meant ... -u mntnr -r name-your-contributors ....

It fails poorly if the repo asked for doesn't exist. I'll fix that.

RichardLitt commented 6 years ago

Heh. Fair. Want to integrate a fix for https://github.com/mntnr/name-your-contributors/issues/49 into this, too?

tgetgood commented 6 years ago

Done.

RichardLitt commented 6 years ago

Looks good to me! Thank you!

RichardLitt commented 6 years ago

:tada: This PR is included in version 3.4.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: