watson-developer-cloud / node-red-node-watson

A collection of nodes for the IBM Watson services
Apache License 2.0
82 stars 86 forks source link

Classify collection function on Natural Language Classifier node. #406

Closed alpha-netzilla closed 6 years ago

alpha-netzilla commented 6 years ago

It seems that the NLC node implemented a new feature of "classify collection" as the below code.

let collection = msg.payload.match( /\(?[^\.\?\!]+[\.!\?$]\)?/g );

However, I often use these characters in one sentence. e.g.

I can hit on other possible problems.

These characters are not necessarily the end of a sentence. e.g.

Could you add an opt-out setting or functions like we can select any characters as the end of a sentence?

chughts commented 6 years ago

Agree a sentence like

msg.payload = 'This is a domain name google.com so should not get split.';

Should not get split. I swill see if I can use a regular expression that excludes '.' followed by a character. If not then I will add in the opt-out clause.

The Japanese end of sentence will not get recognised as a sentence separator, so the only way to use collections would be for the flow to split the sentences out.

chughts commented 6 years ago

Regular expression has been fixed. Will be released in 0.6.12

alpha-netzilla commented 6 years ago

Thank you for your prompt response, but I use these sentences, too.

msg.payload = "Mr. Chughts"
msg.payload = "Mqy. 19, 2018"
msg.payload = "IBM Co., Ltd."
msg.payload = "I went to the U.S. last month!"
msg.payload = "IBM, Google, Apple"

I found still more exceptions even in English. Languages other than English could have more exceptions.

I feel that it is difficult to process mechanically the end of the sentence in this logic.

chughts commented 6 years ago

Good point. I will implement an opt-out option.

chughts commented 6 years ago

Opt out option will be released in 0.6.13