newrelic / node-newrelic

New Relic Node.js agent code base. Developers are welcome to create pull requests here, please see our contributing guidelines. For New Relic technical support, please go to http://support.newrelic.com.
Apache License 2.0
970 stars 399 forks source link

agent crashes app having malformed request urls #26

Closed jondot closed 11 years ago

jondot commented 11 years ago

in newrelic/lib/transaction.js,

  this.url        = url.parse(requestURL, true).pathname.split(';')[0];

When the requestURL isn't really well formed, for example:

http://my-service.com/?track_url=http://another.site.com

Parsing becomes difficult since the track_url param isn't encoded. Bottom line is the agent crashes the application with a cannot call split on null, since the pathname seems to be null.

I would have submitted a pull request but since this is an unfamiliar license I leave it to you guys.

Thanks.

othiym23 commented 11 years ago

Can you provide me an example of how this causes your application to crash? The line you cite records the URL onto a New Relic transaction object, and doesn't affect how your application uses the URL. Furthermore, I don't really get what's wrong with the following:

% node
> var url = require('url')
undefined
> var uri = 'http://my-service.com/?track_url=http://another.site.com'
undefined
> var parsed = url.parse(uri, true)
undefined
> var resolved = parsed.pathname.split(';')[0]
undefined
> resolved
'/'

This uses Node's built-in URL parser, and produces the correct result according to New Relic's specifications.

othiym23 commented 11 years ago

Also, what version of Node are you encountering this problem with?

othiym23 commented 11 years ago

I see what you're getting at now! The part about track_url being ill-formed is a red herring. The problem is that pathname is null if what gets passed in is ?track_url=http://another.site.com instead of /?track_url=http://another.site.com. I'll need to do some testing to figure out how this can happen (and knowing how you're constructing your web stack would be helpful here), but that code can certainly be written more defensively. Good catch! Thanks!

jondot commented 11 years ago

Sure, Node v0.10.3

The stack is simple connect based http service. I'm using a raw handler, and then mounting like so:

connect()
  .use(connect.query())
  .use('/v1/api', handler)
  .listen(3000)

I think perhaps the connect mounting may be a potential blame too?

a printout of requestURL is

?track_url=http://edition.cnn.com/2013/04/21/travel/earth-day-best-wildlife-sites/index.html?hpt=hp_bn5

within that url I think both the double 'http' and double ? are offending.

and a trace is

/Users/dotan/projects/spxx/spxx-nexus/node_modules/newrelic/lib/transaction.js:117
  this.url        = url.parse(requestURL, true).pathname.split(';')[0];
                                                         ^
TypeError: Cannot call method 'split' of null
  at Transaction.measureWeb (/Users/dotan/projects/spxx/spxx-nexus/node_modules/newrelic/lib/transaction.js:117:58)
  at ServerResponse.instrumentedFinish (/Users/dotan/projects/spxx/spxx-nexus/node_modules/newrelic/lib/instrumentation/core/http.js:29:23)
  at ServerResponse.g (events.js:175:14)
  at ServerResponse.EventEmitter.emit (events.js:117:20)
  at ServerResponse.OutgoingMessage._finish (http.js:963:8)
  at ServerResponse.OutgoingMessage.end (http.js:946:10)
othiym23 commented 11 years ago

Node doesn't care about the two ?s or the embedded URL:

% node -e "console.log(require('url').parse('?track_url=http://edition.cnn.com/2013/04/21/travel/earth-day-best-wildlife-sites/index.html?hpt=hp_bn5', true))"
{ protocol: null,
  slashes: null,
  auth: null,
  host: null,
  port: null,
  hostname: null,
  hash: null,
  search: '?track_url=http://edition.cnn.com/2013/04/21/travel/earth-day-best-wildlife-sites/index.html?hpt=hp_bn5',
  query: { track_url: 'http://edition.cnn.com/2013/04/21/travel/earth-day-best-wildlife-sites/index.html?hpt=hp_bn5' },
  pathname: null,
  path: '?track_url=http://edition.cnn.com/2013/04/21/travel/earth-day-best-wildlife-sites/index.html?hpt=hp_bn5',
  href: '?track_url=http://edition.cnn.com/2013/04/21/travel/earth-day-best-wildlife-sites/index.html?hpt=hp_bn5' }

So really, all the agent needs to do is make sure that pathname gets set to something sensible if it comes back undefined. I'll make this change and incorporate it into the next build of the agent. If you want a patch to apply in the interim, something like this should work in the place of the line you copied:

var parsed = url.parse(requestURL, true);
var pathname = parsed.pathname || '/';
this.url = pathname.split(';')[0];
jondot commented 11 years ago

Yup, already patched it locally here before opening the issue just so everything could keep on running. Just didn't know how pull request work here with the current license - so next best thing is trying to describe the problem as an issue :)

Thanks for your attention :+1: