vincentsarbachpulicani / Corsican-Stylometry

Strylometry analysis and topic modeling on a Corsican historical corpus
2 stars 0 forks source link

Stylometry and topic modeling in Corsican language

Organisation of the repository

data -> location of the xml files used for this search

previous_works -> dossier with previous thesis written by Vincent Sarbach-Pulicani on the subject

ressources -> folders with scripts written for the study: only the most important have been kept

results -> results and visualizations of the topic modeling and stylometry

Description

With the emergence of nationalism in the 19th century came regionalist movements to assert and claim cultural particularities. Corsica fitted in very well with this dynamic and even presented itself as a favourable location for the development of such ideas. The centralisation of the State around a strong capital and the policies of assimilation of the indigenous populations on the border with France led certain players to defend these particularisms. It was in this context that the Corsican autonomist newspaper A Muvra was born in May 1920 in Paris, under the impetus of Petru and Matteu Rocca. For almost 19 years, hundreds of authors participated in the writing of this massive dialectal work. The aim of this dissertation is to carry out author profiling, i.e. to determine the style and subjects covered by an author. To do this, we carry out authority attribution stylometry on texts using pseudonyms before completing these analyses with topic modelling, indexing of latent topics in a corpus of texts. The aim is to gain a better understanding of the complex sociology behind this rich and varied newspaper, through the use of computational methods.