microsoft / ghcrawler-cli

A simple command line app for controlling a GitHub crawler
MIT License
11 stars 18 forks source link

Adding tokens from cli fails with parsing error #8

Closed stuartlangridge closed 7 years ago

stuartlangridge commented 7 years ago

node bin/cc tokens 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa#private' makes the crawler server throw an error in its log:

crawler_1    | { SyntaxError: Unexpected token "
crawler_1    |     at parse (/opt/ghcrawler/node_modules/body-parser/lib/types/json.js:83:15)
crawler_1    |     at /opt/ghcrawler/node_modules/body-parser/lib/read.js:116:18
crawler_1    |     at invokeCallback (/opt/ghcrawler/node_modules/raw-body/index.js:262:16)
crawler_1    |     at done (/opt/ghcrawler/node_modules/raw-body/index.js:251:7)
crawler_1    |     at IncomingMessage.onEnd (/opt/ghcrawler/node_modules/raw-body/index.js:307:7)
crawler_1    |     at emitNone (events.js:86:13)
crawler_1    |     at IncomingMessage.emit (events.js:185:7)
crawler_1    |     at endReadableNT (_stream_readable.js:974:12)
crawler_1    |     at _combinedTickCallback (internal/process/next_tick.js:80:11)
crawler_1    |     at process._tickCallback (internal/process/next_tick.js:104:9)
crawler_1    |   body: '"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa#private"',
crawler_1    |   status: 400,
crawler_1    |   statusCode: 400 }

This seems to be a quoting issue; bin/cc joins all the tokens together with ; and then passes one single string as the body, and that body is JSON encoded. This means that the body is not aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa#private, but "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa#private", and the receiver doesn't appear to be expecting the quotes; that is, I think that the sender is sending it as JSON but the receiver is expecting the body to be not JSON. But I'm not sure whether the actual error is cc sending it wrongly or the server's expectation.

Notes: I'm using crawler-in-a-box. I've also tried talking directly to the server config API with curl, but that doesn't help; I get the same error, and if I pass the string unquoted in the body then the /config/tokens endpoint returns 404.

jeremy-lq commented 7 years ago

Can confirm this behavior, and am able to replicate with the most current code.

stuartlangridge commented 7 years ago

PR sent as https://github.com/Microsoft/ghcrawler/pull/113.