juba / rainette

R implementation of the Reinert text clustering method
https://juba.github.io/rainette/
53 stars 7 forks source link

Choice of the Reinert method, validation of the quality of the partitions, their number — search for information or references #21

Closed gabrielparriaux closed 1 year ago

gabrielparriaux commented 1 year ago

Hello,

I am conducting a research project in which I wish to investigate the discourse of teachers in the classroom. My corpus consists of transcriptions of recordings of about twenty lessons given by different teachers, or even more.

Having discovered Reinert's hierarchical top-down clustering, first in Iramuteq, then in R with the rainette package, I plan to use this clustering method to explore the discourse of teachers in class.

Not being a statistician myself, I think I understand the basics of the algorithm, but I'm trying to be more relevant on some aspects.

My questions concern the following topics:

I am glad to receive any possible answer or reference to literature that could help me on these questions!

Thanks in advance for your help and best regards,

Gabriel Parriaux

juba commented 1 year ago

I'm not an expert on this field, but here are some takes based on my not-so-long experience as a pratician with these methods :

On a side note I think that modern dimension reduction algorithms such as t-SNE or UMAP applied to a document-form matrix could also give interesting results.

Sorry not to have real expert knowledge to share or definitive answers to give, hope it is helpful anyway.

gabrielparriaux commented 1 year ago

Hello @juba,

Thanks a lot for your very informed advice on the topic!

I had never heard of t-SNE and UMAP. If I understand correctly, they are other solutions than Correspondence Analysis to reduce dimensions. I saw very nice simulations online using those algorithms, seems interesting!

Again, thanks a lot for your expertise and for your time to answer these questions!

Best,

Gabriel