quanteda / quanteda.textstats

Textual statistics for quanteda
GNU General Public License v3.0
14 stars 2 forks source link

Feature Request: Include an option in textstat_summary to retrieve normalised counts for URLs etc #38

Open dshgna opened 3 years ago

dshgna commented 3 years ago

I find textstat_summary() very useful to compare the textual features between two or more groups.

However, given that the number of puncts, URLs, numbers, symbols, tags, and emojis can be explained by the number of characters/tokens/types, I usually end up writing a custom function to normalize based on length (that is, the longer texts would anyway have more URLs, for example, so normalization is required to compare between texts).

I usually end up writing a function for this, and think it'd be super useful to have this as a feature in textstat_summary.