watson-developer-cloud / node-sdk

:comet: Node.js library to access IBM Watson services.
https://www.npmjs.com/package/ibm-watson
Apache License 2.0
1.48k stars 670 forks source link

[discovery] document additions fail #369

Closed kognate closed 7 years ago

kognate commented 7 years ago

This was reported internally and on SO. https://stackoverflow.com/questions/41473388/create-new-document-in-ibm-watson-discovery-service-with-watson-developer-cloud

There is a bug in the discovery add document code.

manyike commented 7 years ago

Hi @kognate thanks for the fix, but I just had a similar issue now, here is my code snippet

discovery.addDocument({
    environment_id: 'env-id-here',
    collection_id: 'coll-id-here',
    configuration_id: 'config-id-here',
    metadata:'{"Content-Type":"application/json"}',
    file:Buffer.from("HERE IS MY TEST TEXT", 'utf8')
}, function(err, data) {
    if (err) {
        console.error(err);
    } else {
        console.log(JSON.stringify(data, null, 2));
    }
});

and I get the following error

{ Error: The Media Type [application/octet-stream] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .
    at Request._callback (/Users/emmanuelmanyike/manalto/socialnetworkdatacollector/node_modules/watson-developer-cloud/lib/requestwrapper.js:74:15)
    at Request.self.callback (/Users/emmanuelmanyike/manalto/socialnetworkdatacollector/node_modules/request/request.js:186:22)
    at emitTwo (events.js:106:13)
    at Request.emit (events.js:191:7)
    at Request.<anonymous> (/Users/emmanuelmanyike/manalto/socialnetworkdatacollector/node_modules/request/request.js:1081:10)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at Gunzip.<anonymous> (/Users/emmanuelmanyike/manalto/socialnetworkdatacollector/node_modules/request/request.js:1001:12)
    at Gunzip.g (events.js:291:16)
    at emitNone (events.js:91:20)
  code: 415,
  error: 'The Media Type [application/octet-stream] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .' }

I checked in the sdk and I could see that this fix merged above is there.Any ideas why this is still happening?

publu commented 7 years ago

bumpppppp

This issue isn't fixed. Not sure why its closed.

Twanawebtech commented 7 years ago

Any update on this guys? I am getting same error still.

Why is this issue closed? the issue is still there.

nfriedly commented 7 years ago

Hey, sorry this got dropped, we had some internal shuffling around.

I think https://github.com/watson-developer-cloud/node-sdk/issues/474 was actually a duplicate of this, and as of v2.34.0, JSON documents should work either by specifying a .json filename (Discovery only looks at file extensions, not content-type - example here) or by using the new addJsonDocument method which was added specifically to make this use-case easier:

var document_obj = {
  environment_id: environment,
  collection_id: collection,
  file: {"foo": "bar"}
};

discovery.addJsonDocument(document_obj, function (err, response) {
  if (err) {
    console.error(err);
  } else {
    console.log(JSON.stringify(response, null, 2));
  }
});

Can you test and confirm?

There's also an open ticket to add a matching updateJsonDocument method - https://github.com/watson-developer-cloud/node-sdk/issues/477 - that will hopefully be implemented soon. In the meanwhile, you can specify a whatever.json filename similar to the above example.

nfriedly commented 7 years ago

Oh, and to respond to @manyike's code sample, I just noticed that it says application/json in the metadata, but the content is actually text, so the error message from the service was correct in that case.

As mentioned above, the service ignores content-type headers & metadata, and instead checks only the file extension and the content itself. The filename defaults to _ (no extension) when uploading a buffer, which forces the service into content-sniffing mode. In this case, it correctly identified it as text/plain and gave an accurate error stating that text/plain isn't supported.

To specify the filename with a Buffer, set the file param to an object like so:

discovery.addDocument({
    environment_id: 'env-id-here',
    collection_id: 'coll-id-here',
    configuration_id: 'config-id-here',
    file: {
      value: Buffer.from("JSON goes here", 'utf8'),
      options: {
        filename: 'whatever.json'
      }
    }
}, function(err, data) {
    if (err) {
        console.error(err);
    } else {
        console.log(JSON.stringify(data, null, 2));
    }
});