julienjamme / handbook_sdc_from_doc_to_md

Handbook on Statistical Disclosure Control - to quarto book
https://julienjamme.github.io/handbook_sdc_from_doc_to_md/
0 stars 8 forks source link

Styling of formulas and equation #55

Closed JohannesGuss closed 3 months ago

JohannesGuss commented 1 year ago

The math formulas and equations don't follow the same format, is there an agreed styling?

Examples are https://github.com/julienjamme/handbook_sdc_from_doc_to_md/blob/main/chapters/05/_05-frequency-tables-06-Software.qmd lines 13 to 21:

$z = u b + r$,

where *ub* is the lower adjacent multiple of *b* (hence *u* is the floor value of *z/b*) and *r* is the remainder. In the zero-restricted solution the rounded value, *a*, can take values:

$\left\{ \begin{array} { l } a = u b \text { if } r = 0 \\ a = \left\{ \begin{array} { c } u b \\ ( u + 1 ) b \end{array} \text { if } r \neq 0 \right. \end{array} \right.$)

If *K* steps are allowed, then *a*, can take values:

$\left\{ \begin{array} { l } a = \max \{ 0 , ( u + j ) \} b , j = - K , \ldots , K , \text { if } r = 0 \\ a = \max \{ 0 , ( u + j ) \} b , j = - K , \ldots , ( K + 1 ) , \text { if } r \neq 0 \end{array} \right.$)

In the first lines we have $u b$ as a latex formula but then write ub inline with a quadro-styling (<- I am guessing here). I think both stylings are fine but we should stay consistent, at least when talking about variables which appear in equations and in a text. I don't know how powerful this inline formula-tool is so I would imagine sticking with latex for all math related content would be the easier option.

The second part with the $\left\{ \begin{array} .... seems to have a typo with a ) at the end and also with the "double"-array. I also think that the cases-environment in latex is better suited for that situation. At least I believe it is more often used...?

a = \begin{cases}
u b \quad \text { if } r = 0 \\
( u + 1 ) b \quad \text { if } r \neq 0 
\end{cases}

If you want and if there is an agreed choice for styling math-equation and inline-math I could write a pull-request. I could also fix some typos in formulas if I find them.

julienjamme commented 1 year ago

This is all an heritage of the original document that's why we didn't modify it, till now. But I agree with you that it would be better if we uniformize all the mathematical notations and choose the latex style all along.

And, of course, if you do this, it would be very helpful ! But it may be an heavy work, especially in chapters 3, 4 and 5. Don't hesitate to let me know if you need some help.

JohannesGuss commented 1 year ago

Thanks @julienjamme for the clarification. I am already submitting some PRs here and there if I get the time. I also want to use this issue to document some math-styling which needs an agreed upon solution. Some of them might be minor but if the document is "re-styled" anyway we might as well fix those too.

$x^{'}$ and $x'$

Usage of a ' (~ prime) when donating a modified or estimated variable/matrix When the document tells about a variable $x$ and some variable $x'$ (~x prime) the notation differs sometimes between x' and x^{'} x' $\rightarrow$ $x'$ x^{'} $\rightarrow$ $x^{'}$ I would generally prefer x' but x^{'} can be a little bit easier to read, for instance if there is a $\sim$ on top of $x$

$P$ or $\mathbb{P}$

Usage of $P$ or $\mathbb{P}$ when writing the probability operator, e.g. Probability that $X$ is less or equal to $y$ $\rightarrow P(X\leq y)$. In the document usually $P$ is used instead of $\mathbb{P}$, but in some chapters (Chapter 3 Section 4 Subsection6, the PRAM Matrix) $P$ takes the form of a variable/matrix. So using $\mathbb{P}$ might be better overall. In this case one can also use a bold math styling for the PRAM matrix to make a distinction \mathbf{P} $\rightarrow\mathbf{P}$

julienjamme commented 1 year ago

I’ve already merged two of your pull requests. Very great and useful job !

As far as I am concern, I completely agree with what you suggest about the notations. There was a true lack of harmonization in the original document. As we can now write all the mathematical stuff in latex, it will be easier to do this effort. And why not discuss here with the others about the different choices to make.

For the prime symbol I tend to prefer the easiest way to write it x'.

For the probability sign, I have to admit that between $\mathbb{P}$ and $\mathbf{P}$ I use both depending on the context. Why not use what you suggest (\mathbf) for probability , expectation and variance symbols, to keep \mathbb for matrices.

JohannesGuss commented 1 year ago

Thx @julienjamme for accepting PRs: 56, 57, 58, 59

Mathequations and inline math should now be in LateX for the whole book - if I have not missed anything.

The issue with $x^{'}$ or $x'$ and $P$ or $\mathbb{P}$ are not yet harmonised, because I believe these should be discussed in a larger round and not only between @julienjamme and myself.

Leaving this issue open also for other notation-related issues/decissions.

ppdewolf commented 1 year ago

Personally I would object to using \mathbb for matrices.

My suggestion would be to use \mathbb{P}, \mathbb{E} etc. for probability, expectation-operator etc. and \mathbf for vectors and matrices.

Easiest would be to d efine some commands for probability etc. at the beginning of the document. then it is easily changed afterwards if needed. E.g. something like

\newcommand{\Prob}{\mathbb{P}}
\newcommand{\Ex}{\mathbb{E}}
\newcommand{\Var}{\textrm{Var}}

and then use something like

\[
\Var(X) = \Ex(X^2) - (\Ex X)^2
\]
\[
p_{ij} = \Prob(X^*=i\mid X=j)
\]
\[
\mathbb{P} = \left( \begin{array}{rr}
                       0.9 & 0.1 \\
                       0.1 & 0.9
                    \end{array}\right)
\]
julienjamme commented 1 year ago

Indeed, \mathbf seems more suitable for vectors and matrices. So agree with your suggestion. I am going to check how to add new latex commands in the quarto environment.