Open AbeHandler opened 5 years ago
Yes, that's always been the case. I thought I wrote a mapping system for phrasemachine that mapped both the ARK and PTB tagsets to a standardized coarse tagset, which solves this problem. If it's not in the python version, maybe it's in the R version or our earlier research versions?
Oh sorry I'm misunderstanding the question; never mind
This is a weird corner case, but worth noting and perhaps fixing. If you are using the library with the ARK tagger you might get pronouns tagged with "O".
Because phrasemachine marks tokens that are not in the coarsemap with "O" (i.e. other) this does weird things when you have a custom regex that involves pronoun tags.
I think an easy fix is to change "O" to another, rare-r character internally. Another option (better?) is just not to fix this.