Open ghost opened 10 years ago
To add onto this, is there a way for the perceptron tagger to tag all punctuation? For example, can it tag periods, question marks, quotes, and all text it comes across? Would be a really big help. Currently second best that does punctuation is NTLK tagger but yours is much better.
It should tag all punctuation, but it'll have trouble with unicode entities.
I'm not supporting this code unfortunately anymore --- I'm working full time on spaCy, which is now under the MIT license too ( http://spacy.io ). SpaCy handles non-ascii characters appropriately, and is both faster and more accurate.
NLTK have recently agreed to use this tagger. However, I dont know how well they support unicode punctuation.
On Thursday, October 15, 2015, LeavesBreathe notifications@github.com wrote:
To add onto this, is there a way for the perceptron tagger to tag all punctuation? For example, can it tag periods, question marks, quotes, and all text it comes across? Would be a really big help. Currently second best that does punctuation is NTLK tagger but yours is much better.
— Reply to this email directly or view it on GitHub https://github.com/sloria/textblob-aptagger/issues/2#issuecomment-148368732 .
While using your tagger, I am getting good results.
However, when it comes to quotes, such as inch-symbols and quoted text, the tagger is completely ignoring the quotation marks, making it difficult for me to work with these cases.
Is there a way to make sure that the quotes are tagged as quotes, like in stanfordparser?