Implement a "Table 1" functionality

FloLimebit commented 2 years ago

For the data paper we need a main summary table for Open Discourse.

This table needs the following details broken down by each electoral term:

Earliest Date
Last Date
Count Sessions
Cumulation Sessions
Count Speeches
Cumulation Count Speeches
Count Tokens
Cumulation Tokens
Count Contributions
Cumulation Contribs
Count Tokens Contributions
Cumulation Count Tokens Contribs

The final Table should look like this (fake data):

Electoral Term	Earliest Date	Latest Date	Sessions Count	Cumulated Sessions Count	Count Speeches	Cumulated Count Speeches	Count Tokens	Cumulated Count Tokens	Count Contributions	Cumulated Count Contributions	Count Tokens of Contributions	Cumulated Count of Tokens of Contributions
1	01.01.70	01.01.75	134	134	245	245	1.000.000	1.000.000	567	567	500.000	500.000
2	02.01.75	01.01.80	150	284	546	791	2.000.000	3.000.000	367	924	400.000	900.000

The function needs to return the table as tibble/data frame but also as latex-format so we can use it in the data paper. Checkout the packagesknitr::kable and kableExtra to create the latex table.

Add a decimal point to the output numbers (e.g. by setting it globally with options(OutDec = ".")

lwarode commented 2 years ago

what algorithm should be used for tokenization? tidytext::unnest_tokens uses "words" as default (which e.g. drops punctuation)

FloLimebit commented 2 years ago

what algorithm should be used for tokenization? tidytext::unnest_tokens uses "words" as default (which e.g. drops punctuation)

Yes that's perfect

lwarode commented 2 years ago

we should use , instead of . (publication standard)

lwarode commented 2 years ago

maybe we should include an option to add line breaks to column names, otherwise the latex table is too small (some columsn are taking too much space)

open-discourse / opendiscouRse

Implement a "Table 1" functionality #30