We have some sentence-level features which could potentially be useful in token-level models as well, such as token count; however, all tokens in a sequence obviously would have the same value for these kinds of features.
What is the motivation behind using them or not using them?
We have some sentence-level features which could potentially be useful in token-level models as well, such as token count; however, all tokens in a sequence obviously would have the same value for these kinds of features.
What is the motivation behind using them or not using them?