Closed fredrik1984 closed 3 years ago
Upcoming release will have two new features that independently address the request:
Tokens belonging to PoS-classes that are to be included in co-occurrence, but included in windows during computation of co-occurrence, can now be specified. These tokens will be replaced by the same "padding token" as currently is used when dealing with windows at document's start and end. The padding token is a marker in the token stream with the sole purpose of keeping word distances intact.
PoS-tags can now also be appended to tokens e.g. "information@NN". This enables both token and PoS-class to be filtered the co-occurrence explorer GUI. Note that currently the PoS-class will be shown for each word. This feature also enables the system to distinguish between e.g "händer@VB" and "händer@NN" which was not possible before unless one of the PoS-classes was filtered out.
När vi använder co-occurrence-sidan i Jupyter vill vi kunna göra samförekomstanalysen baserat på alla ordklasser. Om vi är intresserade av att studera samförekomster av verb till ordet "information" med ett ordfönster på 5 så vill vi att fönster som inte innehåller något verb ska ignoreras.
Vi behöver (1) definiera hur vårt korpus ska se ut som samförekomstberäkningen ska baseras på (lemma, stoppord, etc), och (2) vilken/vilka ordklasser som det ska tas hänsyn till samförekomstberätkningen.