watson-developer-cloud / node-red-node-watson

A collection of nodes for the IBM Watson services
Apache License 2.0
82 stars 86 forks source link

Discovery Document Loader loads text as 'value' attribute, not 'text' attribute #434

Open jaumemir opened 5 years ago

jaumemir commented 5 years ago

The module "Discovery Document Loader" loads document content (the msg.payload), under an attribute named "value", when the standard expectation is to have it under the attribute named "text" in Discovery. This requires to enrich the "value" field, generating "enriched_value" instead of "enriched_text", which is non-standard and a client application may fail if enrichments are not where expected.

chughts commented 5 years ago

Implemented in 0.7.5

chughts commented 5 years ago

Had to back out during test, as the stream input processing requires the attribute 'value', else the open stream from a document file causes node-red to crash.

Lotti commented 5 years ago

And it doesn't upload json as they are... this best I achieved is having the stringified json structure inside the value field: not useful at all.

chughts commented 5 years ago

How are you injecting the json? It has been tested with HTTP request, HTTP multipart and File Inject, and PDF Hummus.

[{"id":"6a0eaf1a.5ae95","type":"fileinject","z":"e0b750ac.1c66c","name":"Inject a doc or json","x":134,"y":803,"wires":[["af11a3c2.83d6b"]]},{"id":"f6eb0f41.7a89b","type":"fileinject","z":"e0b750ac.1c66c","name":"Inject a pdf","x":104,"y":823,"wires":[["d52e8d8b.7f484"]]},{"id":"d52e8d8b.7f484","type":"pdf-hummus","z":"e0b750ac.1c66c","name":"","filename":"TempDoscV1.pdf","split":true,"mode":{"value":"asBuffer"},"x":282,"y":806,"wires":[["af11a3c2.83d6b"]]},{"id":"af11a3c2.83d6b","type":"watson-discovery-v1-document-loader","z":"e0b750ac.1c66c","name":"","environment_id":"6a02b192-b8ed-4839-b108-e02a842d26e5","collection_id":"06e8c1b0-9858-4a38-861f-d35cd96d70f6","filename":"Temp Docs","default-endpoint":true,"service-endpoint":"https://gateway.watsonplatform.net/discovery/api","x":518,"y":774,"wires":[["9d8fc50a.4acec8"]]},{"id":"9d8fc50a.4acec8","type":"debug","z":"e0b750ac.1c66c","name":"","active":true,"console":"false","complete":"true","x":540,"y":677,"wires":[]}]

Lotti commented 5 years ago

I have pure json, not stored in a file...

I modified the old discovery-insert node provided by community for my purposes: https://github.com/Lotti/node-red-contrib-discovery-insert

I will find time to fix and test the official node to propose a pull request... Already setup the env but I've got the deadline today and preferred to go with something easier to modify respect to official nodes.

Anyway... This is the solution I found to adding json as document to discovery with latest sdk

                var env = (msg.hasOwnProperty('environment_id')) ? msg.environment_id : environment;
                var col = (msg.hasOwnProperty('collection_id')) ? msg.collection_id : collection;

                const string = JSON.stringify(msg.payload.content);
                const file = Buffer.from(string, 'utf8');
                const sha1 = getSHA1(string);
                const filename = msg.payload.filename || `${sha1}.json`;

                var document_obj = {
                    environment_id: env,
                    collection_id: col,
                    file: file,
                    filename: filename,
                    file_content_type: 'application/json',
                };

                discovery.addDocument(document_obj, function (err, response) {
                    if (err) {
                        if (err.code === 429) {
                            resolve(429);
                        } else {
                            reject(err);
                        }
                    } else {
                        resolve(response);
                    }
                });
prismboy commented 4 years ago

NLU analysis is not performed unless text data is stored in the 'text' property. Change it to the 'text' property instead of the 'value' property.

rcorig commented 4 years ago

I have pure json, not stored in a file...

I modified the old discovery-insert node provided by community for my purposes: https://github.com/Lotti/node-red-contrib-discovery-insert

I will find time to fix and test the official node to propose a pull request... Already setup the env but I've got the deadline today and preferred to go with something easier to modify respect to official nodes.

Anyway... This is the solution I found to adding json as document to discovery with latest sdk

                var env = (msg.hasOwnProperty('environment_id')) ? msg.environment_id : environment;
                var col = (msg.hasOwnProperty('collection_id')) ? msg.collection_id : collection;

                const string = JSON.stringify(msg.payload.content);
                const file = Buffer.from(string, 'utf8');
                const sha1 = getSHA1(string);
                const filename = msg.payload.filename || `${sha1}.json`;

                var document_obj = {
                    environment_id: env,
                    collection_id: col,
                    file: file,
                    filename: filename,
                    file_content_type: 'application/json',
                };

                discovery.addDocument(document_obj, function (err, response) {
                    if (err) {
                        if (err.code === 429) {
                            resolve(429);
                        } else {
                            reject(err);
                        }
                    } else {
                        resolve(response);
                    }
                });

Hey Lotti, thanks for the code. It is exactly what I need. One thing thou. When trying to run it, it gives an error stating "Error: Missing required parameters: apikey" Even thou the apikey is there. image

As I am newbie to all this, can you please help me? Thanks in advance!

Lotti commented 4 years ago

Watson SDK changes frequently.. maybe the problem you are encountering with API Key is related to it. I'll spend some time on this later today... I suggest to use my package because I don't have rights to merge code here (this repo) nor the official npm module.

rcorig commented 4 years ago

Sure thing! Thanks for the fast reply. As an ex-IBMER, I really appreciate it. You are a life savior!!

Lotti commented 4 years ago

Try using this module, install it from Node-Red Palette. Please remove the original before installing mine. https://www.npmjs.com/package/node-red-contrib-discovery-insert-temp

I didn't had time to test it on real discovery instance... will try tomorrow if won't work for you.

I'll try also to send a pull request here... but I think the owner of the repo doens't maintain anymore these nodes.

chughts commented 4 years ago

@Lotti @rcorig Please take this discussion to the GitHub repo that you are referring to https://github.com/GwilymNewton/node-red-contrib-discovery-insert

Perhaps if you report your issues to the correct repo, the owner might actually accept your pull request.

Lotti commented 4 years ago

@chughts ok for me. Any news on implementing this feature inside the official watson discovery nodes? If no, should I invest time and do a pull request about it?

rcorig commented 4 years ago

@chughts ok for me as well. @Lotti Thanks man! It worked perfectly. With the https://www.npmjs.com/package/node-red-contrib-discovery-insert-temp, discovery was uploaded on the correct way! You are the man!!! Thank you very much!

chughts commented 4 years ago

@Lotti If you can do it and verify that there is no regression in all current working cases, then I will happily accept a pull request.