Closed tamagokun closed 6 years ago
hey mike, yeah it's a good question, and the short answer is that there are a few fleabag ways to do it, but not a good solution, like the one you've suggested.
you can make any tag you want, but to get the Doctor->Person
stuff, for now you can do this:
const lexicon={
jones:['Doctor', 'Person']
}
or alternatively,
var doc=nlp(myText, {jones:'Doctor'}).
doc.match('#Doctor').tagAs('Person')
... but we should really support a clever way to extend the native tag stuff. I'm happy to work on that.
seems like providing some kind of API for working with the lexicon/tagset is in order. Right now compromise seems super slow because every time I run it, it "loads" the exact same lexicon that I am supplying.
I'd love to be able to set up my lexicon, tagsets, once, and then set those for compromise.
Thanks for providing insight into how to work around the tagset thing right now, it seems to work well!
yeah! thanks. your timing is very good for this feature. we can look at including it in v11, which will hopefully be ready sometime this week.
how would this be?
var nlp=require('compromise') //does background init work
nlp.addWords(myLexicon) //your lexicon (persistent)
nlp.addTags({Person: ['Doctor', 'Nurse', 'Plumber']}) //plug these into the tagging logic
//now this is fast-path
nlp(text1)
nlp(text2)
i got stuck on this just cause i was trying to make a nlp.clone()
method, that somehow would let you have two different functions. I still haven't figured out how to do that
that's exactly what I need :+1:
Yes, this really helps. :+1:
No sure if this is a digression, but how to we best handle polysemy (?) when we create new lexicons.
First determine the tag (pos) before checking the custom lexicon?
"I doctored the photograph"
sorry for the delay,
yeah, for more context-sensitive tagging, i recommend doing it afterwards with .match().tagAs()
var doc=nlp("Have You Met Life Today?")
doc.match('#QuestionWord #Noun met')....
doc.match('met life #Verb')...
or whatever..
i think of it as the lexicon is for not-smart tagging, and the smarter stuff's gotta come afterwards.
Just tried out v11, totally amazing. Going to close this issue. Thanks for a great library!
Wondering if there is a way to extend the tagset logic for custom tags? I've been poking around the source and don't see anything to indicate that it can be done.
Example: