jvcalderon / gist-client

A client to consume Gist API with JS
MIT License
9 stars 0 forks source link

Search fails with 502 #2

Closed mindrones closed 5 years ago

mindrones commented 5 years ago

Related to #1, in order to retrieve public gists containing the word "svelte", I'm using (this time with a TOKEN):

const GistClient = require("gist-client");

const gistClient = new GistClient();
const GITHUB_TOKEN = "GITHUB_TOKEN";

gistClient
.setToken(GITHUB_TOKEN)
.getAll({
    rawContent: true,
    filterBy: [
        {public: true},
        {content: "svelte"},
        {since: "2018-11-01T00:00:01Z"}
    ]
})
.then(gistList => {
    console.log(JSON.stringify(gistList))
})
.catch(err => {
    console.log(err)
});

but I get this error:

{ StatusCodeError: 502 - {"message":"Server Error"}
    at new StatusCodeError (/path/to/node_modules/request-promise-core/lib/errors.js:32:15)
    at /path/to/node_modules/request-promise-core/lib/plumbing.js:97:41
    at process.internalTickCallback (internal/process/next_tick.js:77:7)
  name: 'StatusCodeError',
  statusCode: 502,
  message: '502 - {"message":"Server Error"}',
  error: { message: 'Server Error' },
  options:
   { url:
      'https://api.github.com/gists/public?per_page=100&since=2018-11-01T00%3A00%3A01Z&page=25',
     headers:
      { Authorization: 'token TOKEN',
        'User-Agent': 'GistClient' },
     json: true,
     transform: [Function: _includeHeadersTransformer],
     callback: [Function: RP$callback],
     simple: true,
     resolveWithFullResponse: false,
     transform2xxOnly: false },
  response:
   { headers:
      { server: 'GitHub.com',
        date: 'Sun, 02 Dec 2018 16:39:17 GMT',
        'content-type': 'application/json',
        'content-length': '32',
        connection: 'close',
        etag: '"tag"',
        'x-github-request-id': 'id' },
     data: { message: 'Server Error' } } }

Seems to fail at the 25th page. Am I doing something wrong? Thanks!

jvcalderon commented 5 years ago

Hi @mindrones. By this filters (public, content, since) you are trying to send this request to Gist API: GET: https://api.github.com/gists/public?per_page=100&since=2018-11-01T00:00:01Z This call returns 100 items but the library iterates over the full content (30 pages for this request right now); sometimes it returns 502 (bad gateway) in page 25, sometimes in 21, 19... It seems a limitation of Gist API to avoid crawling or misuses.

mindrones commented 5 years ago

Hi, eh I suspected that :/ Any suggestion on how to retrieve those gists in some other way? Ssearching by hand in the UI (https://gist.github.com/search?q=svelte) returns 232 gists, not many, but it seems to be possible, maybe they use a private API for search? Thanks!

jvcalderon commented 5 years ago

I think GistClient is not a good solution to handle a large volume of Gists. It was developed to make easier the management of well delimited list (user owned for example). You have to keep in mind that the previous filter produces ≈30 requests (1 per page) and one request more for each gist in page (because of 'rawContent' flag), 30*100. It could rebase easily the API limits and probably you will receive a 403 ("abuse detection"). Sadly we can't avoid it.

The UI constructs the response by a private method. Maybe you can consume this endpoint (https://gist.github.com/search?q=svelte) by doing a crawler in your backend, but it doesn't seem a clean solution.

mindrones commented 5 years ago

Eh, then scraping https://gist.github.com/search?q=svelte may be a one time solution indeed if we'll end up not needing to search for gists regularly.

I'll close this one, thanks for taking the time to reply! :)