jbmusso / gremlin-javascript

JavaScript tools for graph processing in Node.js and the browser inspired by the Apache TinkerPop API
MIT License
214 stars 63 forks source link

Silently throws away data from the response if it contains UTF-8 characters #93

Open StephenWeatherford opened 6 years ago

StephenWeatherford commented 6 years ago

When reading data from an Azure CosmosDB graph, where the data contains non-ansi characters, e.g.:

{ "type": "vertex", "properties": { "text": [ { "id": "a0a0eb9e-8679-4c80-a0e4-ea2e00a96b71", "value": "RT @colisscom: [JS]このスクリプト一つで、さまざまな用途に合わせたスライダーが実装できて便利 -Tiny Slider\n\nhttps://t.co/ObL6zbEAN4 https://t.co/J0KtrMAgor" } ], ...

I'm seeing a lot of the data getting silently lost. The problem occurs here in GremlinClient.js:

handleProtocolMessage(message) { let rawMessage, requestId, statusCode, statusMessage; try { const { data } = message; const buffer = new Buffer(data, 'binary'); <<<< message.data is a string, and message.binary = false rawMessage = JSON.parse(buffer.toString('utf-8')); requestId = rawMessage.requestId; statusCode = rawMessage.status.code; statusMessage = rawMessage.status.message; } catch (e) { this.warn('MalformedResponse', 'Received malformed response message'); <<<< HITS HERE return; }

In this case the message is already a UTF-decoded string (message.binary = false). Also, wouldn't it be better to bubble up the exception, rather than silently cutting it off?

Seems like the code should be something like:

  const { data } = message;
  if (message.binary) {
      const buffer = new Buffer(data, 'binary');
      rawMessage = JSON.parse(buffer.toString('utf-8'));
   } else {
      rawMessage = data;
   }
jbmusso commented 6 years ago

Thanks for reporting this. I agree there’s some tinkering that needs to be done down here!

Will look into it.

On Thu 9 Nov 2017 at 21:05, Stephen Weatherford notifications@github.com wrote:

When reading data from an Azure CosmosDB graph, where the data contains non-ansi characters, e.g.:

{ "type": "vertex", "properties": { "text": [ { "id": "a0a0eb9e-8679-4c80-a0e4-ea2e00a96b71", "value": "RT @colisscom: [JS]このスクリプト一つで、さまざまな用途に合わせたスライダーが実装できて便利 -Tiny Slider\n\nhttps://t.co/ObL6zbEAN4 https://t.co/J0KtrMAgor" } ], ...

I'm seeing a lot of the data getting silently lost. The problem occurs here in GremlinClient.js:

handleProtocolMessage(message) { let rawMessage, requestId, statusCode, statusMessage; try { const { data } = message; const buffer = new Buffer(data, 'binary'); <<<< message.data is a string, and message.binary = false rawMessage = JSON.parse(buffer.toString('utf-8')); requestId = rawMessage.requestId; statusCode = rawMessage.status.code; statusMessage = rawMessage.status.message; } catch (e) { this.warn('MalformedResponse', 'Received malformed response message'); <<<< HITS HERE return; }

In this case the message is already a UTF-decoded string (message.binary = false). Also, wouldn't it be better to bubble up the exception, rather than silently cutting it off?

Seems like the code should be something like:

const { data } = message; if (message.binary) { const buffer = new Buffer(data, 'binary'); rawMessage = JSON.parse(buffer.toString('utf-8')); } else { rawMessage = data; }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jbmusso/gremlin-javascript/issues/93, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMa1HqolAR38VKG52CgikXLDVL3kkz9ks5s01r5gaJpZM4QYknq .

-- Jean-Baptiste

StephenWeatherford commented 6 years ago

Thanks. I'll submit a PR if I can get some time. I'm temporarily working around it with this code:

// Patch up handleProtocolMessage as a temporary work-around for https://github.com/jbmusso/gremlin-javascript/issues/93
var originalHandleProtocolMessage = client.handleProtocolMessage;
client.handleProtocolMessage = function handleProtocolMessage(message) {
  if (!message.binary) {
    // originalHandleProtocolMessage isn't handling non-binary messages, so convert this one back to binary
    message.data = new Buffer(message.data);
    message.binary = true;
  }

  originalHandleProtocolMessage.call(this, message);
};