spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

[Issue]: Gov Rule & Possible Other's Needs Improved. #1095

Open MarketingPip opened 4 months ago

MarketingPip commented 4 months ago

The government rule I suggested earlier (I seen you made a PR for it while back) is causing a issue.

Rule can be found here.

When using code below:

let doc = nlp("government of canada")
console.log(doc.match('government of #Country'))
/* Outputs:
{
  "ptrs": []
}
*/ 

Results for Canada are tagged as:

 "tags": [
        "Noun",
        "Singular",
        "ProperNoun",
        "Organization"
 ]

Tho I do know Canada is used as a proper noun in this example, should the matcher / tags still not have ["Place", "Country"]...?

Assuming this / and a few other rules need frozen tags etc - so some tags still remain for when using .match()

Correct me if wrong....?

spencermountain commented 4 months ago

yeah, that's a tricky one.

Country is a #Place and a #Place is not a #Organization

you're welcome to remove the rule, as a PR, if you'd like cheers

MarketingPip commented 4 months ago

@spencermountain - hmmm... I seen in the post tagger / rule set "Canada" is place.

 { match: 'government of the? [#Place+]', tag: 'Organization', reason: 'government-of-x' },

Just to confirm we are both on same pages / understand.

It is supposed to match this "whole string" as "#Organization" as "government of Canada", is a valid organization. (Correct.....?)

Which I am not sure why there is a group for Canada in the rule. If I am correct in saying it should just be this:

 { match: 'government of the? #Place+', tag: 'Organization', reason: 'government-of-x' },

And this should match the whole string only and tag the whole exact match as org. Instead of what I am assuming was just a improper group / rule? Correct...?

(I just seen you had this rule using a match - which I am assuming is just a mistake...?)

Hoping you can help do some digging and confirm some info here.

ps; I will be honest I have no real professional / real world use case of Compromise. Just constantly playing with / or trying to build you some datasets or little snippets (which I got countless I still got let you have access to) lol. So hope you don't think I am dropping / making issues asking for like free premium support. 😆

MarketingPip commented 4 months ago

@spencermountain sorry for another comment but won't this work correctly... (assuming this is what you meant to do lol)

{ match: 'government of the? [#Place+]', tag: 'Place', reason: 'government-of-place' },
{ match: 'government of the? #Place+', tag: 'Organization', reason: 'government-of-place-org' },
spencermountain commented 4 months ago

hey Jared, no that doesn't work. The tags are developed as a tree and something can be a Place or an Org, but not both. cheers

MarketingPip commented 4 months ago

@spencermountain - Hmmmm, just realized this....

Works properly as expected when using in-proper spelling of Government.

let doc = nlp("goverment of canada")

console.log(doc.match("Goverment of #Country").text())

console.log(doc.match("#Place").text())//