sindresorhus / got

🌐 Human-friendly and powerful HTTP request library for Node.js
MIT License
14.27k stars 935 forks source link

HTTPError: Response code 422 (Unprocessable Entity) upon second execution of pagination.paginate #1221

Closed PopGoesTheWza closed 4 years ago

PopGoesTheWza commented 4 years ago

Describe the bug

Iteration records from SugarCRM rest api with Got.paginate()

Actual behavior

After processing the first 20 records (default page size in the api), an HTTP 422 error is thrown:

✖ [ERROR] HTTPError: Response code 422 (Unprocessable Entity)
    at PromisableRequest.<anonymous> (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/as-promise/index.js:124:28)
    at processTicksAndRejections (internal/process/task_queues.js:97:5) {
  code: undefined,
  timings: {
    start: 1588609061815,
    socket: 1588609061815,
    lookup: 1588609061815,
    connect: 1588609061815,
    secureConnect: 1588609061815,
    upload: 1588609061816,
    response: 1588609062031,
    end: 1588609062032,
    error: undefined,
    abort: undefined,
    phases: {
      wait: 0,
      dns: 0,
      tcp: 0,
      tls: 0,
      request: 1,
      firstByte: 215,
      download: 1,
      total: 217
    }
  }
}

I have inspected the normalizedOptions at node_modules/got/dist/source/create.js line 148 and saw nothing wrong. I also validated that the headers & json payload were fine by shooting Postman. Last but not least, agent's keepAlive being true or false has no impact.

Expected behavior

Got.paginate() should process more than one page.

Code to reproduce

...

Checklist

PopGoesTheWza commented 4 years ago

The HTTP status hint at a malformed request, despite the newOptions looking good and tests in Postman working fine. Can one hint me at a reasonably simple way to check what the actual request Got is sending as well as the response returned by the server?

PopGoesTheWza commented 4 years ago

I was able to validate that the 422 is indeed the response from the remote rest api server.

Now given that all is fine when using Postman and that the normalizedOptions in Got (dist/source/create.js@172) look good when about to send the (crashing) request for the second page, I don't know where and how to look further.

szmarczak commented 4 years ago

What's your error.response.request.options?

PopGoesTheWza commented 4 years ago

I may have something to start looking into.

First let's consider this "I will cause a 422 on second page" snippet:

    const iterator = client.paginate<Account>('Accounts', {
      json: {
        fields: 'name,siret_c',
        filter: [{ $and: [{ siret_c: { $not_null: null } }, { siret_c: { $not_equals: '' } }] }],
      },
    });

    for await (const account of iterator) {
      // Code in here is not relevant to the issue
    }

Now let's change the json payload with a stringified body. In this use case, json and body should be interchangeable (as long as types are correct.

    const iterator = client.paginate<Account>('Accounts', {
      body: JSON.stringify({
        fields: 'name,siret_c',
        filter: [{ $and: [{ siret_c: { $not_null: null } }, { siret_c: { $not_equals: '' } }] }],
      }),
    });

    for await (const account of iterator) {
      // Code in here is not relevant to the issue
    }

And I now get a different error (after correctly processing the first page)

✖ [ERROR] RequestError: The `body`, `json` and `form` options are mutually exclusive
    at PromisableRequest._destroy (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/core/index.js:1048:21)
    at PromisableRequest.destroy (internal/streams/destroy.js:38:8)
    at /Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/core/index.js:281:22
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at PromisableRequest._finalizeBody (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/core/index.js:563:23)
    at /Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/core/index.js:264:28
    at new Request (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/core/index.js:283:11)
    at new PromisableRequest (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/as-promise/core.js:32:1)
    at makeRequest (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/as-promise/index.js:37:29)
    at /Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/as-promise/index.js:193:9
    at /Users/guillaumec/rc-dev/rcsf/node_modules/p-cancelable/index.js:61:11
    at new Promise (<anonymous>)
    at new PCancelable (/Users/guillaumec/rc-dev/rcsf/node_modules/p-cancelable/index.js:31:19)
    at Object.asPromise [as default] (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/as-promise/index.js:26:21) {
  code: undefined,
  timings: undefined
}

Or is it me missing something obvious? :/

PopGoesTheWza commented 4 years ago

@szmarczak Here is the error.response.request.options on the 422 status

{
  method: 'GET',
  retry: {
    calculateDelay: [Function: calculateDelay],
    limit: 2,
    methods: [ 'GET', 'PUT', 'HEAD', 'DELETE', 'OPTIONS', 'TRACE' ],
    statusCodes: [
      408, 413, 429, 500,
      502, 503, 504, 521,
      522, 524
    ],
    errorCodes: [
      'ETIMEDOUT',
      'ECONNRESET',
      'EADDRINUSE',
      'ECONNREFUSED',
      'EPIPE',
      'ENOTFOUND',
      'ENETUNREACH',
      'EAI_AGAIN'
    ],
    maxRetryAfter: Infinity
  },
  headers: {
    'user-agent': 'got (https://github.com/sindresorhus/got)',
    'oauth-token': 'd5179e7e-b9f5-49ed-9330-ab4b9e0d1f27',
    'content-type': 'application/json',
    'content-length': '109',
    'accept-encoding': 'gzip, deflate, br'
  },
  cache: undefined,
  dnsCache: CacheableLookup {
    maxTtl: Infinity,
    fallbackTtl: 1,
    errorTtl: 0.15,
    _lockTime: 150,
    _cache: Map { 'myHostedInstance.myHostingProvider.com' => [Array] },
    _resolver: Resolver { _handle: ChannelWrap {} },
    _lookup: [Function: lookup],
    _resolve4: [Function: bound queryA],
    _resolve6: [Function: bound queryAaaa],
    _iface: { has4: true, has6: true },
    _hostsResolver: HostsResolver {
      _hostsPath: '/etc/hosts',
      _error: null,
      _hosts: [Object],
      _promise: null
    },
    _tickLocked: false,
    _pending: {},
    lookup: [Function: bound lookup],
    lookupAsync: [Function: bound lookupAsync] AsyncFunction
  },
  decompress: true,
  throwHttpErrors: true,
  followRedirect: true,
  isStream: false,
  responseType: 'text',
  resolveBodyOnly: false,
  maxRedirects: 10,
  prefixUrl: [Getter/Setter],
  methodRewriting: true,
  ignoreInvalidCookies: false,
  http2: false,
  allowGetBody: true,
  rejectUnauthorized: true,
  pagination: {
    transform: [Function: transform],
    paginate: [Function: paginate],
    filter: [Function: filter],
    shouldContinue: [Function: shouldContinue],
    countLimit: Infinity,
    requestLimit: 10000,
    stackAllItems: true
  },
  username: '',
  password: '',
  agent: {
    http: Agent {
      _events: [Object: null prototype],
      _eventsCount: 2,
      _maxListeners: undefined,
      defaultPort: 80,
      protocol: 'http:',
      options: [Object],
      requests: {},
      sockets: {},
      freeSockets: {},
      keepAliveMsecs: 1000,
      keepAlive: false,
      maxSockets: Infinity,
      maxFreeSockets: 256,
      [Symbol(kCapture)]: false
    },
    https: Agent {
      _events: [Object: null prototype],
      _eventsCount: 2,
      _maxListeners: undefined,
      defaultPort: 443,
      protocol: 'https:',
      options: [Object],
      requests: {},
      sockets: {},
      freeSockets: {},
      keepAliveMsecs: 1000,
      keepAlive: false,
      maxSockets: Infinity,
      maxFreeSockets: 256,
      maxCachedSessions: 100,
      _sessionCache: [Object],
      [Symbol(kCapture)]: false
    }
  },
  url: URL {
    href: 'https://myHostedInstance..com/rest/v11_5/Accounts',
    origin: 'https://myHostedInstance.myHostingProvider.com',
    protocol: 'https:',
    username: '',
    password: '',
    host: 'myHostedInstance.myHostingProvider.com',
    hostname: 'myHostedInstance.myHostingProvider.com',
    port: '',
    pathname: '/rest/v11_5/Accounts',
    search: '',
    searchParams: URLSearchParams {},
    hash: ''
  },
  mutableDefaults: true,
  hooks: {
    init: [],
    beforeRequest: [],
    beforeRedirect: [],
    beforeError: [],
    beforeRetry: [],
    afterResponse: [ [AsyncFunction], [AsyncFunction] ]
  },
  lookup: [Function: bound lookup],
  searchParams: undefined,
  request: undefined,
  timeout: {},
  [Symbol(isNormalizedAlready)]: true,
  [Symbol(request)]: [Function: request]
}
szmarczak commented 4 years ago

Can you try the following:

  1. set decompress to false
  2. set methodRewriting to false

If any of these are don't fix the issue, please post your pagination function and after response hooks (there are two, right?)

And I now get a different error (after correctly processing the first page)

That definitely seems to be a bug. It shouldn't throw the mutually exclusive error if you use the body option only. I'll look at the stack trace tomorrow.

PopGoesTheWza commented 4 years ago

Can you try the following:

  1. set decompress to false
  2. set methodRewriting to false

I tried each and both without any improveemnt.

If any of these are don't fix the issue, please post your pagination function and after response hooks (there are two, right?)

Those might seem familiar for I made you sweat on it last week ;)

  pagination: {
    // eslint-disable-next-line @typescript-eslint/prefer-readonly-parameter-types
    paginate: (response: Response) => {
      const body: List = JSON.parse((response as Response<string>).body);
      if (is.plainObject(body) && is.number(body.next_offset) && body.next_offset >= 0) {
        const { options } = response.request;
        const { json, ...otherOptions }: any = options;
        const result: Options = {
          json: { ...json, offset: body.next_offset },
          ...otherOptions,
        };

        return result;
      }

      return false;
    },
    // eslint-disable-next-line @typescript-eslint/prefer-readonly-parameter-types
    transform: (response: Response<any>) => {
      const { body } = response;
      const parsed = is.string(body) ? JSON.parse(body) : body;
      if (is.plainObject(parsed)) {
        const { records } = parsed;
        if (is.array(records)) {
          return records as unknown[];
        }

        throw new Error(typeof records);
      }

      throw new Error(typeof parsed);
    },
  },
    afterResponse: [
      // eslint-disable-next-line @typescript-eslint/prefer-readonly-parameter-types
      async (response, retryWithMergedOptions) => {
        // Unauthorized
        if (response.statusCode === 401) {
          const updatedOptions: ExtendOptions = { headers: { 'oauth-token': await getNewAuthToken() } };

          // Save for further requests
          authenticatedInstance.defaults.options = got.mergeOptions(
            authenticatedInstance.defaults.options,
            updatedOptions
          );

          return retryWithMergedOptions(updatedOptions);
        }

        // No changes otherwise
        return response;
      },
    ],

(On my TODO list is a cleaner, more elegant and efficient management of OAuth2 token, including effective renewal)

That definitely seems to be a bug. It shouldn't throw the mutually exclusive error if you use the body option only. I'll look at the stack trace tomorrow.

I feel like I am being a pain here... ;)

szmarczak commented 4 years ago

Those might seem familiar for I made you sweat on it last week ;)

I haven't been able to work on Got that much today. But will do ASAP when I have the time.

I see that you're passing a json option in the paginate function. Did you forget to change this when you were passing a body option to the instance? (maybe that's why you got the are mutually exclusive error)

Also try logging error.response.request.options.json and see if you can successfully recreate the request using Postman.

I feel like I am being a pain here... ;)

No, you just find very interesting bugs, which is awesome. By contributing to Got you make it even better.

    afterResponse: [ [AsyncFunction], [AsyncFunction] ]

Here you can see two after response hooks. But in the code you provided there's only one. Another bug here?

PopGoesTheWza commented 4 years ago

I see that you're passing a json option in the paginate function. Did you forget to change this when you were passing a body option to the instance? (maybe that's why you got the are mutually exclusive error)

Yeah... I forgot... Hit me :/

Also try logging error.response.request.options.json and see if you can successfully recreate the request using Postman.

The value of error.response.request.options.json is correct and matches when used with postman

{
  "fields": "name,siret_c",
  "filter": [
    {
      "$and": [
        {
          "siret_c": {
            "$not_null": null
          }
        },
        {
          "siret_c": {
            "$not_equals": ""
          }
        }
      ]
    }
  ],
  "offset": 20
}
PopGoesTheWza commented 4 years ago

@szmarczak I have an idea of what could be causing the issue.

Between first and second requests, the size of body/jsonpayload increases. Could it be that something gets truncated? Maybe Content-Size not getting updated?

Anyhow, in Postman, truncating the json payload (hence sending invalid json) is one sure way to get a 422 response.

PopGoesTheWza commented 4 years ago

@szmarczak My idea seems to be correct.

I switched to using body in place of json and added a .padEnd(1024, ' ') to the stringified payload... no more crashes!

Thus I deduce that somehow the payload does indeed ends up being truncated upon requesting the second page.

I hope this will help.

szmarczak commented 4 years ago

Here you can see two after response hooks. But in the code you provided there's only one. Another bug here?

I need your input on this too :)

PopGoesTheWza commented 4 years ago

Here you can see two after response hooks. But in the code you provided there's only one. Another bug here?

I need your input on this too :)

Guess what? I have one single afterResponse hook.

Now if I inspect normalizedOptions.hooks.afterResponse in node_modules/got/dist/source/create.js@94, its length double at each iteration (i.e. 2 at page 2, 4 at page 3, 8 at page 4, etc.)

Funny for I was about to add my share to #1220 .

PopGoesTheWza commented 4 years ago

@szmarczak Here is a test case for this bug

// Nicer test case in PR #1231 

The relevant part is json: {...json, page} which alters the size of the json payload after page 1. If you change it to json: {...json} which leaves the payload unmodified and same size, it works fine.

szmarczak commented 4 years ago

Thanks! I will definitely check this.

szmarczak commented 4 years ago

Don't do return {json: {...json, page}, ...otherOptions};. The options are merged automatically, and by doing that you just duplicate them. I should've documented that :man_facepalming:

szmarczak commented 4 years ago

I fixed your test 47c1afedc77b27a2eb3edd85064d8532da835045 and it's not failing...

szmarczak commented 4 years ago

Can you try this:

    paginate: (response: Response) => {
      const body: List = JSON.parse((response as Response<string>).body);
      if (is.plainObject(body) && is.number(body.next_offset) && body.next_offset >= 0) {
        const { options } = response.request;
        const { json } = options;
        const result: Options = {
          json: { ...json, offset: body.next_offset }
        };

        return result;
      }

      return false;
    },
szmarczak commented 4 years ago

Between first and second requests, the size of body/jsonpayload increases. Could it be that something gets truncated? Maybe Content-Size not getting updated?

I think that's it. Let me check.

szmarczak commented 4 years ago

I'm unable to reproduce even with 5131dc2778a9b584acda062635a9458d5b22df0c I'll make a release now.

PopGoesTheWza commented 4 years ago

Using Got@11.1.3 without the 'fixed sized body' but json to hold the payload:

HTTPError: Response code 422 (Unprocessable Entity)
    at PromisableRequest.<anonymous> (/Users/guillaumec/rc-dev/rcsf/node_modules/got/dist/source/as-promise/index.js:124:28)
    at processTicksAndRejections (internal/process/task_queues.js:97:5) {
  code: undefined,
  timings: {
    start: 1589118254963,
    socket: 1589118254963,
    lookup: 1589118254964,
    connect: 1589118254972,
    secureConnect: 1589118254980,
    upload: 1589118254980,
    response: 1589118255036,
    end: 1589118255037,
    error: undefined,
    abort: undefined,
    phases: {
      wait: 0,
      dns: 1,
      tcp: 8,
      tls: 8,
      request: 0,
      firstByte: 56,
      download: 1,
      total: 74
    }
  }
}
szmarczak commented 4 years ago

https://github.com/sindresorhus/got/issues/1221#issuecomment-626322729 Did you try it?

PopGoesTheWza commented 4 years ago

So... this is in fact another symptom of #1223 root cause.

As soon as having pagination.paginate to return only the options update, the issue is fixed.

Thanks for your patience and time @szmarczak :)