Retrieve original (and most representative ?) segments of a lemmatized corpus after clustering

juba / rainette

R implementation of the Reinert text clustering method

https://juba.github.io/rainette/

53 stars 7 forks source link

Retrieve original (and most representative ?) segments of a lemmatized corpus after clustering #30

Closed JacquesAntoine closed 9 months ago

JacquesAntoine commented 9 months ago

Hello,

Is there any way (with rainette) to retrieve the original segments of a lemmatized corpus after clustering.

I would like to illustrate the content of clusters with quotes from respondents (using e.g. the Cluster documents of rainette_explore).

That is with "natural" language" rather than with the sequence of lemmes.

Thank you very much, Jacques-Antoine

juba commented 9 months ago

Hi,

If you apply your lemmatization (for example with tokens_wordstem()) after splitting your corpus into segments with split_segments(), you can easily retrieve cluster membership of your original segments with cutree(). The example in the introduction should get that I think (but maybe I didn't understand your request correctly ?).

JacquesAntoine commented 9 months ago

Hello,

Great, thank you very much (I confess I did the splitting after the lemmatization...) !

Best regards, Jacques-Antoine