Closed FokkeZB closed 8 years ago
What’s the value for the X-RateLimit-*
response headers?
X-RateLimit-Limit
the limit your app hasX-RateLimit-Remaining
how many API calls you have leftX-RateLimit-Reset
when the reset will happenThat applies to the APIs only. Since (at least when I wrote gittio) global search has no API I use a scraping script for that.
So probably they explicitly forbidden scraping 😨
Yep, and search still requires a user, org or repo filter:
$ curl https://api.github.com/search/code?q=addClass+in:file+language:js
{
"message": "Validation Failed",
"errors": [
{
"message": "Must include at least one user, organization, or repository",
"resource": "Search",
"field": "q",
"code": "invalid"
}
],
"documentation_url": "https://developer.github.com/v3/search/"
}
I've updated the user agent I sent. Let's see if that fixes it, but I doubt. It stopped working after August 31st, so I guess they rolled out new security on september 1st.
The only alternative is to let people report orgs/users/repos to search. But that kind of beats the spider idea behind gitTio.
Okay, so I found the issue. GitHub no longer allows you to search all repos if you are not logged in.
Here's an example of an URL the bot fetches: https://github.com/search?utf8=✓&q=moduleid+AND+guid+AND+minsdk+AND+platform+filename%3Amanifest+in%3Afile
Try it logged out and you'll see you are required to have an org/user/repo
I guess I'll have to see if I can login
Did you try to authenticate to test the search APIs requests?
It isn't explicitly stated at https://developer.github.com/v3/search/#search-code but last time I tried, even authenticated request require either repo or owner filter.
I've tried again and I have good news, bad news and then good news.
The good news is that I (now) can search code in all repositories through the API.
The bad news is that the result does not include the date the result was indexed, nor can I limit the search to only include results of files that changed since I last searched.
But, I can get use another API to get the last commit of the file of each result and use that instead.
So.... Currently I'm indexing all new sources since August 31!
Since end of august gitTio doesn't index anymore. This seems to be because GitHub doesn't like that we crawl the browser search. Since it broke I get these:
/cc @Topener, @jasonkneen