sindresorhus / got

🌐 Human-friendly and powerful HTTP request library for Node.js
MIT License
14.05k stars 920 forks source link

Broken zlib compressed response #388

Closed wong2 closed 5 years ago

wong2 commented 6 years ago

Minimal code to reproduce the issue:

const newrelic = require('newrelic')
const got = require('got')
const Koa = require('koa')
const app = new Koa()

app.use(async ctx => {
  await got('https://wx3.llss.fun/8c007b5cly1fi19hv6n4ej21kw0w0doc.jpg')
  ctx.body = 'ok'
})

app.listen(3000)

The error:

/Users/wong2/codes/demo/node_modules/newrelic/lib/transaction/tracer/index.js:191
      throw err
      ^

Error: invalid distance too far back
    at Unzip.zlibOnError (zlib.js:152:15)

This only occurs with newrelic + koa2 + got>=7 + some specific images, if one of the condition doesn't match, the issue disappears.

node v8.2.1

puzrin commented 6 years ago

AFAIK, zlib was changed in ~ node 8.3.x or later. Try with node 8.2.1 for sure.

@sindresorhus possible regression from got 6 -> 7. I got this report https://github.com/nodeca/probe-image-size/issues/16 and the only change was got upgrate to 7.

sindresorhus commented 6 years ago

Hard to say what it could be. There were a lot of changes between v6 and v7: https://github.com/sindresorhus/got/compare/v6.7.1...v7.0.0

If I were to guess, it might be caused by: https://github.com/sindresorhus/got/commit/578f38d5276d78be33bf55eff2d38b79ed6482d6#diff-168726dbe96b3ce427e7fedce31bb0bcR15 (Which upgraded the dependency) (This commit was the main change: https://github.com/sindresorhus/decompress-response/commit/79ee2525958ebf30e33f08d4ff97e9bd9bc4c086)

Could also be New Relic hooking into something Got depends on and being naughty. It wouldn't be the first time: https://github.com/sindresorhus/p-cancelable/pull/4

Also might be an issue in Node.js itself: https://stackoverflow.com/questions/34396566/error-invalid-distance-too-far-back-when-inflate-html-gzip-content

puzrin commented 6 years ago

That's a minimal sample to reproduce problem, thanks to @wong2 for url address:

let stream = require('got').stream('https://mp.weixin.qq.com/mp/qrcode?scene=10000004&size=102&__biz=MjM5MzA0MzczMg==');

stream.on('error', err => console.log('===', err));

=>

 node test.js 
events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: incorrect header check
    at Unzip.zlibOnError (zlib.js:152:15)

There are 2 problems:

  1. Unhandled 'error' event should not be thrown.
  2. Browsers can open that url and show something (json with error info)

Is my sample correct? Seems enougth to prohibit global exceptions from stream.

puzrin commented 6 years ago

That's without stream:

let r = require('got')('https://mp.weixin.qq.com/mp/qrcode?scene=10000004&size=102&__biz=MjM5MzA0MzczMg==');

r.then(data => console.log('+++', data)).catch(err => console.log('---', err));

=>

node test.js 
--- { ReadError: incorrect header check
    at stream.catch.err (/home/vitaly/Кодинг/probe-image-size/node_modules/got/index.js:164:26)
    at <anonymous>
  name: 'ReadError',
  code: 'Z_DATA_ERROR',
  host: 'mp.weixin.qq.com',
  hostname: 'mp.weixin.qq.com',
  method: 'GET',
  path: '/mp/qrcode?scene=10000004&size=102&__biz=MjM5MzA0MzczMg==',
  protocol: 'https:',
  url: 'https://mp.weixin.qq.com/mp/qrcode?scene=10000004&size=102&__biz=MjM5MzA0MzczMg==' }

Still error, but no global exception.

puzrin commented 6 years ago

@sindresorhus both problems (zlib error + global throw) still exist in 8.0.0. That's regression after 6.x and (IMHO) critical thing when you do massive scans of arbitrary addresses.

sindresorhus commented 6 years ago

@puzrin I tracked down the unhandled error event: https://github.com/sindresorhus/got/pull/424

As for why it's throwing in the first place, I think it's because browsers are more lenient to incorrect responses than Node.js:

Turns out many browsers over the years implemented an incorrect deflate algorithm. Instead of expecting the zlib header in RFC 1950 they simply expected the compressed payload. Similarly various web servers made the same mistake.

So, over the years browsers started implementing a fuzzy logic deflate implementation, they try for zlib header and adler checksum, if that fails they try for payload.

-https://stackoverflow.com/questions/388595/why-use-deflate-instead-of-gzip-for-text-files-served-by-apache#9856879

puzrin commented 6 years ago

Thanks! Fix for unhandled errors should solve the most unpleasant problem.

About zlib header error - can't say the right behaviour. At least, request processes it without error. Don't remember exactly about got v6:

require('request')({
  uri: 'https://mp.weixin.qq.com/mp/qrcode?scene=10000004&size=102&__biz=MjM5MzA0MzczMg=='
}, function (err, res, body) {
  if (err) {
    console.log('---', err);
    return;
  }

  console.log(body);
});
max-degterev commented 3 years ago

Is there a workaround for this? This problem is still pretty much relevant, I see it in my app logs often enough:

https://runkit.com/suprmax/5fa03e5ac3502d001a4e3b7e

UPD for people coming from google this library is a good drop in replacement: https://github.com/tomas/needle

szmarczak commented 3 years ago

@suprMax They just supress the errors. The error is correct.

puzrin commented 3 years ago

@suprMax They just supress the errors. The error is correct.

IMO this "correct error" does not makes happy anyone in real world :).

max-degterev commented 3 years ago

It's a moot point, I need my website to work, not to adhere to some arbitrary standard. The decision to fail these requests might look good on paper, it isn't though.