Notebook: co-occurrence – ordklasser som ska ingå i samförekomster

Upcoming release will have two new features that independently address the request:

Tokens belonging to PoS-classes that are to be included in co-occurrence, but included in windows during computation of co-occurrence, can now be specified. These tokens will be replaced by the same "padding token" as currently is used when dealing with windows at document's start and end. The padding token is a marker in the token stream with the sole purpose of keeping word distances intact.
PoS-tags can now also be appended to tokens e.g. "information@NN". This enables both token and PoS-class to be filtered the co-occurrence explorer GUI. Note that currently the PoS-class will be shown for each word. This feature also enables the system to distinguish between e.g "händer@VB" and "händer@NN" which was not possible before unless one of the PoS-classes was filtered out.

welfare-state-analytics / welfare_state_analytics