juba / rainette

R implementation of the Reinert text clustering method
https://juba.github.io/rainette/
53 stars 7 forks source link

Time series analysis #24

Closed gabrielparriaux closed 1 year ago

gabrielparriaux commented 1 year ago

Hello,

Is it possible to do time series analysis with the data coming from rainette?

I have seen these kinds of graphs that are very interesting when you want to see the evolution of clusters during the time.

Capture d’écran 2023-01-30 à 10 15 34 Capture d’écran 2023-01-30 à 10 15 47

(source: Manchaiah, V., Ratinaud, P., & Andersson, G. (2018). Representation of tinnitus in the US newspaper media and in Facebook pages : Cross-sectional analysis of secondary data. Interactive Journal of Medical Research, 7(1), e9065.)

Two questions about that:

— to obtain a similar graph, do you know if you need to conduct separate clustering operations, splitting the corpus by year and performing a new rainette clustering for each year? Or can you take the whole corpus together and do one big clustering, the result of which would be then be split by years?

— do you have an idea of the way to obtain something similar with the data provided by rainette?

Thanks a lot for helping!

juba commented 1 year ago

I think you would have to run the clustering on the whole dataset, otherwise if you run it year by year you won't necessarily obtain comparable clusters.

To produce this type of analysis, you first need to have a date information about your documents in your corpus metadata. After clustering the documents with rainette, you can use cutree to add a new variable to your metadata containing each document cluster. Then you can easily use both the cluster and date variables to generate plots or other analyses about evolution of clusters along time for example.

gabrielparriaux commented 1 year ago

Thanks a lot for your advice on this topic! Any hint about where to start with the code to plot such a graph?

juba commented 1 year ago

It really depends on what you want to plot... May be you could take a look at mosaicplot or, if you use ggplot2, you can use something like geom_col with position=fill.

gabrielparriaux commented 1 year ago

Sounds great! Thanks a lot for pointing to this! I will check it out!