open-discourse / opendiscouRse

Package containing functions to obtain descriptive statistics and analyses of the Open Discourse database.
https://open-discourse.github.io/opendiscouRse/
Other
3 stars 0 forks source link

Implement a "Table 1" functionality #30

Closed FloLimebit closed 2 years ago

FloLimebit commented 2 years ago

For the data paper we need a main summary table for Open Discourse.

This table needs the following details broken down by each electoral term:

The final Table should look like this (fake data):

Electoral Term Earliest Date Latest Date Sessions Count Cumulated Sessions Count Count Speeches Cumulated Count Speeches Count Tokens Cumulated Count Tokens Count Contributions Cumulated Count Contributions Count Tokens of Contributions Cumulated Count of Tokens of Contributions
1 01.01.70 01.01.75 134 134 245 245 1.000.000 1.000.000 567 567 500.000 500.000
2 02.01.75 01.01.80 150 284 546 791 2.000.000 3.000.000 367 924 400.000 900.000

The function needs to return the table as tibble/data frame but also as latex-format so we can use it in the data paper. Checkout the packagesknitr::kable and kableExtra to create the latex table.

Add a decimal point to the output numbers (e.g. by setting it globally with options(OutDec = ".")

lwarode commented 2 years ago
FloLimebit commented 2 years ago
  • what algorithm should be used for tokenization? tidytext::unnest_tokens uses "words" as default (which e.g. drops punctuation)

Yes that's perfect

lwarode commented 2 years ago

we should use , instead of . (publication standard)

lwarode commented 2 years ago

maybe we should include an option to add line breaks to column names, otherwise the latex table is too small (some columsn are taking too much space)