metamath / set.mm

Metamath source file for logic and set theory
Other
254 stars 88 forks source link

LaTeX CI Checks #3079

Closed tirix closed 1 year ago

tirix commented 1 year ago

@david-a-wheeler expressed his wish for LaTeX checks in the CI.

@GinoGiotto suggested to use a tool to generate a TeX file listing all definitions. We can't visually check the result but we could then run some checker like e.g. ChkTeX to ensure the generared TeX file is semantically valid.

david-a-wheeler commented 1 year ago

@tirix - ChkTeX is an interesting idea. My intended idea was to generate some TeX files, then ask TeX generate pdf files & see if that works. We could do both.

GinoGiotto commented 1 year ago

I can provide the code I made if it's useful, there are some issues tho:

  1. It's written in Matlab, which isn't free, so it probably needs to be translated.
  2. It includes some commented latexdefs. This isn't an issue for my purposes, but I guess you would prefer them to be excluded.
  3. It doesn't include latexdefs that aren't placed at the beginning of the line. That's the reason I changed some indentations in https://github.com/metamath/set.mm/pull/3062

In any case I believe this approach is better than writing only a few TeX proofs/statements, since with the latter method you wouldn't check the functionality of all latex definitions.

david-a-wheeler commented 1 year ago

It's written in Matlab

Does it also run in GNU Octave? If not, could it be easily converted? GNU Octave is pretty compatible for such things, I've used it for some machine learning work.

I agree, a single test that "checks all definitions" makes sense, as long as it doesn't take TOO long.

GinoGiotto commented 1 year ago

Does it also run in GNU Octave? If not, could it be easily converted? GNU Octave is pretty compatible for such things, I've used it for some machine learning work.

I don't use Octave so I'm not sure, I know they are generally similar, but there might be some functions that Matlab has and Octave hasn't, as well as some minor language differences.

I agree, a single test that "checks all definitions" makes sense, as long as it doesn't take TOO long.

My code takes about 10 seconds to generate the list, which isn't super fast, I guess someone with more experience than me could make it more efficient.

Let me know if you are interested. It's only 80 lines of code, so I guess I can just share it here.

david-a-wheeler commented 1 year ago

10 seconds is fine. Yes, interested.

If its only purpose is to test the generated TeX code, the easy solution is to add it to set.mm (say in scripts/), which is also where our CI test code for set.mm is. I don't remember if our base image includes LaTeX.

GinoGiotto commented 1 year ago

Ok, so I runned it again right now with Matlab R2021a 64-bit and it works fine.

The following file (collector.txt) is the code, I changed the format to .txt because Github doesn't let me load Matlab files, originally it was 'collector.m', but I don't know if it's the same for Octave.

collector.txt

The following is first line of the code, here you write which Metamath database you want to read, you need to have that file in the same folder as "collector.m". I setted it to read 'set.mm':

fileL = fopen('set.mm'); % Reading file

I downloaded the most recent version of set.mm, and runned the program, this should be the output:

list.txt

The program generates the file in a .tex format, I changed to .txt for the same reason described above.

Each line corresponds to one latex substitution. I observed that only one of them latexdef "log" as "log"; cannot 'stand alone', meaning that it will produce an error if there isn't anything in front of it. As long as all the theorems that uses it are well formed, this shouldn't be an issue, but it should be appended as an exception for the checker then.

Note: even if you remove log_ in 'list.tex' you will still find that the file produces errors, that's because there are a few malfunctioning latexdefs left to fix (indeed if I remove them it compiles without problems, I will report them soon).

I already added \usepackage{mathrsfs} to solve the problem mentioned here: https://github.com/metamath/set.mm/pull/3066#discussion_r1118161762, this package is not implemented in Metamath yet, but it will be once this pull request https://github.com/metamath/metamath-exe/pull/130/commits/f5b8c11c10ee090c178183b36eb2a2888ecd4cc4 will be merged.

david-a-wheeler commented 1 year ago

Here is the Matlab code from @GinoGiotto - are you releasing this to the public domain, releasing under MIT license, something else?

Code:

fileL = fopen('set.mm'); % Reading file
C = textscan(fileL,'%s','delimiter','\n');
fclose(fileL);
C = strtrim(string(C{:})); % This vector contains the whole original Metamath database
n = size(C,1);
i = 1;
t = 1;
new = string.empty; % This vector contains the latex substitutions

% Filling the vector "new"
while i <= n
    if ~isempty(regexp(C(i),'latexdef') == 1) && regexp(C(i),'latexdef') == 1
        stringa = string(regexp(C(i),'"','split'));
        if (length(stringa) == 5)
            new(t) = stringa(4);
            new(t) = insertBefore(new(t),"\","\");
        elseif (length(stringa) == 3)
            if stringa(3) == " as"
                j = 1;
                B = convertStringsToChars(C(i+j));
                tot = "";
                while B(end) ~= ';'
                    B = B(2:end-3);
                    tot = tot + convertCharsToStrings(B);
                    j = j + 1;
                    B = convertStringsToChars(C(i+j));
                end
                B = B(2:end-2);
                tot = tot + convertCharsToStrings(B);
                new(t) = tot;
                new(t) = insertBefore(new(t),"\","\");
            else
                tex = string(regexp(stringa(3),"'",'split'));
                new(t) = tex(2);
                new(t) = insertBefore(new(t),"\","\");
            end
        elseif (length(stringa) == 4)
            clear stringa
            stringa = string(regexp(C(i),' ','split'));
            B2 = convertStringsToChars(stringa(4));
            B2 = B2(2:end-2);
            new(t) = convertCharsToStrings(B2);
            new(t) = insertBefore(new(t),"\","\");
            clear B1
            clear B2
        end
        t = t + 1;
    end
    i = i + 1;
end

% Creation of the tex file

fileID = fopen("list.tex",'w'); % Output file

fprintf(fileID,"\\documentclass{article}\n");
fprintf(fileID,"\\usepackage{graphicx}\n");
fprintf(fileID,"\\usepackage{amssymb}\n");
fprintf(fileID,"\\usepackage{amsmath}\n");
fprintf(fileID,"\\usepackage{amsthm}\n");
fprintf(fileID,"\\usepackage{mathrsfs}\n");
fprintf(fileID,"\\theoremstyle{plain}\n");
fprintf(fileID,"\\newtheorem{theorem}{Theorem}[section]\n");
fprintf(fileID,"\\newtheorem{definition}[theorem]{Definition}\n");
fprintf(fileID,"\\newtheorem{lemma}[theorem]{Lemma}\n");
fprintf(fileID,"\\newtheorem{axiom}{Axiom}\n");
fprintf(fileID,"\\allowdisplaybreaks[1]\n");
fprintf(fileID,"\\usepackage[plainpages=false,pdfpagelabels]{hyperref}\n");
fprintf(fileID,"\\hypersetup{colorlinks}\n");
fprintf(fileID,"\\begin{document}\n");
fprintf(fileID,"\n");
fprintf(fileID,"\\begin{proof}\n");
fprintf(fileID,"\\begin{align}\n");
for i = 1:length(new)
    fprintf(fileID, "\\\\" + i + " &&  & " + new(i) + "\n" );
end
fprintf(fileID,"\\end{align}\n");
fprintf(fileID,"\\end{proof}\n");
fprintf(fileID,"\n");
fprintf(fileID,"\\end{document}");
fclose(fileID);
clear
benjub commented 1 year ago

I would replace amsmath with mathtools in this code (mathtools load amsmath anyway).

@GinoGiotto : for log_, use "\operatorname{\underline{log}}" (and for "log", I hope the TeX is already "\log").

GinoGiotto commented 1 year ago

are you releasing this to the public domain, releasing under MIT license, something else?

To be honest I'm not educated in this field, feel free to do whatever you want with it, I won't complain.

GinoGiotto commented 1 year ago

for log_, use "\operatorname{\underline{log}}" (and for "log", I hope the TeX is already "\log").

Ok, I noted it in my list.

I would replace amsmath with mathtools in this code (mathtools load amsmath anyway).

Done collector.txt

GinoGiotto commented 1 year ago

I made a small optimization. Originally 'collector.m' cycled through the entire Metamath database, but that's inefficient. I edited so that it cycles through the $t comment only. Now it takes about 3 seconds to run.

collector.txt

tirix commented 1 year ago

Could it be that a simple sed script would get us the same result? Maybe something like:

sed -n -E 's/^ *latexdef "([^"]+)" as "([^"]+)";/\\begin{verbatim}\1\\end{verbatim} \& \2/p' set.mm

Which outputs something like:

\begin{verbatim}|-\end{verbatim} & \vdash
\begin{verbatim}ph\end{verbatim} & \varphi
\begin{verbatim}ps\end{verbatim} & \psi

Then we only have to add the prelude/header/footer.

benjub commented 1 year ago

There are some latexdefs using single quotes, but this should be manageable.

Add \\ on each line of the resulting file.

GinoGiotto commented 1 year ago

@tirix Your approach looks much better than mine. Does it work for latexdefs spanning over multiple lines? I'm very interested in more efficient TeX collectors.

david-a-wheeler commented 1 year ago

To be honest I'm not educated in this field, feel free to do whatever you want with it, I won't complain.

Okay, I'm interpreting "do whatever you want with it" as a CC0 dedication: https://creativecommons.org/publicdomain/zero/1.0/

david-a-wheeler commented 1 year ago

I think sed would have trouble with the definitions crossing multiple lines. I'd use awk or python.

GinoGiotto commented 1 year ago

@david-a-wheeler I updated the comments with the CC0 licenced code files.

tirix commented 1 year ago

I think sed would have trouble with the definitions crossing multiple lines. I'd use awk or python.

The simple pattern above does not work with multiple line, as suggested by @david-a-wheeler we either have to use another tool, or more advanced sed options. Anyway my point was rather that we probably did not need the power of Matlab or Octave for that generation.

benjub commented 1 year ago

I already added \usepackage{mathrsfs} to solve the problem mentioned here: #3066 (comment), this package is not implemented in Metamath yet, but it will be once this pull request metamath/metamath-exe@f5b8c11 will be merged.

That PR was moved to https://github.com/metamath/metamath-exe/pull/131 and it has been merged. The preambule now has \usepackage{mathrsfs} (and also phonetic and mathtools).

GinoGiotto commented 1 year ago

I think I made some progress. I wrote a shorter version of "collector" in Python, runned it with the Python IDLE and it seems to work. I used regular expressions, which aren't really my strong suit, but I think it looks promising, specifically:

  1. It should work for latexdefs sorrounded by either double or single quotes.
  2. It should work regardless of the placement of latexdef within the line.
  3. It should work if there are double or single quotes inside the latex substitution (I found that something like "([^']+)" creates some problems for these cases).
  4. It should work for latexdefs crossing multiple lines.
  5. It should be fast, I experienced a running time of less than a second.

The following code prints LaTeX substitutions in the Python shell:

# Creative Commons Zero v1.0 Universal

# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted.

# This code is licensed under CC0 - https://creativecommons.org/publicdomain/zero/1.0/

# Author: Gino Giotto

import re

with open('set.mm', 'r') as file:
    text = file.read()

pattern = r"latexdef\s+['\"]\S+['\"]\s+as(?P<tex>(?:\s*\n?\s*['\"].+['\"]\s*\+?)+);"
matches = re.finditer(pattern, text)

i = 0
for match in matches:
    i = i + 1
    tex_part = (match.group("tex")).strip()
    trimmed = tex_part[1:-1].strip()
    final = re.sub(r"['\"]\s*\+\n\s*['\"]"," ", trimmed)
    print("\\\\" + str(i) + " &&  & " + " " + final)

And the following file is the complete code that generates the TeX list in a Metamath proof format: TeX_collector.txt

This is the output file: list.txt

The only issue is that it probably picks up some commented LaTeX substitutions, so unless there will be improvements, bad latex would be checked in the comments too.

tirix commented 1 year ago

Looks good! Could you open a PR adding that python script in the scripts directory? There are already other python scripts there. In my opinion it would be more useful to have the original metamath ASCII token as a reference (one could use \begin{verbatim}...\end{verbatim} in LaTeX), rather than the index, but actually for just a CI check it does not matter, and we can discuss that in the PR, or add it later.

GinoGiotto commented 1 year ago

Like this? list.txt I'm not so sure about it, I would prefer to keep the file "pure" to be confident that errors come from TeX substitutions only, but if it's useful for some specific reason I can add it. In any case I would keep the index, I think it's useful to know how many LaTeX definitions it collects and check if this information is consistent with other LaTeX counters (e.g. I noticed that my previous version collected 1 less latexdef).

I'll make the pull request soon.

tirix commented 1 year ago

Like this? list.txt

Yes, this way we can also see the original token, so this can act like the mmascii page, but for LaTeX.

benjub commented 1 year ago

Also, you can name your file list.tex !

GinoGiotto commented 1 year ago

@tirix Nope, indeed I tested it and TeX programs complain about verbatim being inside align. Try this simple example list.txt

Could you please provide another format for adding the tokens?

benjub commented 1 year ago

The "tabular" environment is more appropriate than the "align" environment.

david-a-wheeler commented 1 year ago

Well, I guess the good news is that we are finding & fixing the problems systematically. Once we fix it, and add tests to keep it fixed, it should stay fixed :-).

GinoGiotto commented 1 year ago

The "tabular" environment is more appropriate than the "align" environment.

It doesn't work either, it still gives errors for verbatim and it doesn't recognize math formulas. Also in general I would prefer to not change the basic Metamath proof style to have more reliable testing.

tirix commented 1 year ago

I have not done any LaTeX since too many years to count, so I'm afraid I'm not of good advice here...

This StackOverflow answer says you need to declare the column as a paragraph column, have you tried that?

The ASCII math token is a better reference than a running index, but if it does not work, but you can drop the verbatim idea, it is not necessary to add a CI check on LaTeX definitions.

GinoGiotto commented 1 year ago

This StackOverflow answer says you need to declare the column as a paragraph column, have you tried that?

I missed that, thanks. Yes this solution works for placing verbatim inside tabular. However there is still the problem that it doesn't compile math formulas (it gives errors like \mathrm is allowed only in math mode), so I tried to make a math mode environment for each and every latexdef. That compiled, but then I noticed that the output file had the table cut, meaning that if the table is too long, it doesn't produce more pages of it. So the tabular environment can't be implemented. Then I tried some alternatives to verbatim inside align such as lstlisting but they still don't work. I also tried some workarounds like \texttt{token}, but they incorrectly compile some special characters like \.

I was able to make it work using the enumerate environment, which also has built-in indexing:

\documentclass{article}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{mathtools}
\usepackage{amsthm}
\usepackage{mathrsfs}
\begin{document}

\begin{enumerate}
\item \verb!(! \quad $($
\item \verb!)! \quad $)$
\item \verb!->! \quad $\rightarrow$
\item \verb!&! \quad $\mathrel{\&}$
\item \verb!=>! \quad $\Rightarrow$
\item \verb!-.! \quad $\lnot$
\item \verb!wff! \quad $\mathrm{wff}$
\item \verb!|-! \quad $\vdash$
\item \verb!ph! \quad $\varphi$
\end{enumerate}

\end{document}

Which produces the output:

TeX

The only problem is that no token should have a ! inside its text. There are also slight compiling differences between math mode of \align and math mode of $, specifically $ is a bit more restrictive than \align and it complains for something like \text{\large\boldmath$\times$}, while \align doesn't.

I didn't use \begin{verbatim} and \begin{align} here because they aren't inline environments, technically they could be implemented, but the result was quite messy.

This solution has an advantage tho, it's easier to spot malfunctioning latexdefs because in this case TeX editors tell where the errors are.

If you like this proposal we could work from here, otherwise I will just make a pull request of my original solution I guess.

benjub commented 1 year ago

Yes, tabular is merely an environment to write tables, so it does not switch to math mode. You have to write equations as usual by surrounding them with $...$. I'm not sure what your "table cut" problem is, but since the positioning is "floating", pdflatex may have a problem positioning it if it spans more than a page. You can tweak the python code so that every 20 lines, for instance, it adds something like \end{tabular}\begin{tabular}.

david-a-wheeler commented 1 year ago

I remember having to fight long tables for the Metamath book (which uses TeX).

i ended up switching to "longtabu" for big tables. This was a big improvement; the big demo table in appendix A could have words flow in a cell AND nicely split across pages. Headers reappear on each page. Hyperlinks work, and cells can have forced line breaks. This works on both normal size and narrow size. Your mileage may vary, but it may be worrth investigating. Check out metamath-book if you want to see an example.

GinoGiotto commented 1 year ago

My TeX editors had some trouble with \usepackage{tabu}, but I was able to find a solution with \usepackage{longtable} and the \begin{longtable}...\end{longtable} environment. Now the table is displayed across multiple pages correctly.

LaTeX code:

\documentclass{article}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{mathtools}
\usepackage{amsthm}
\usepackage{mathrsfs}
\usepackage{longtable}
\begin{document}

\begin{longtable}{|l|}
    1. \quad \verb!(! \quad $($ \\
    2. \quad \verb!)! \quad $)$ \\
    3. \quad \verb!->! \quad $\rightarrow$ \\
    4. \quad \verb!&! \quad $\mathrel{\&}$ \\
    5. \quad \verb!=>! \quad $\Rightarrow$ \\
    6. \quad \verb!-.! \quad $\lnot$ \\
    7. \quad \verb!wff! \quad $\mathrm{wff}$ \\
    8. \quad \verb!|-! \quad $\vdash$ \\
    9. \quad \verb!ph! \quad $\varphi$ \\
    10. \quad \verb!ps! \quad $\psi$ \\
    11. \quad \verb!ch! \quad $\chi$ \\
\end{longtable}

\end{document}

Output:

LaTeX

The width of the table is automatically adjusted depending on the longest latex substitution present in the list. For set.mm it is as wide as in the figure above.

benjub commented 1 year ago

Now you can make a real table: use \begin{longtable}{|l|l|}, replace \quad with & and add \hline at the beginning and end. Maybe add two column titles in a new first row: \verb!ascii! & \LaTeX\ \\. Row numbering is probably not needed.

benjub commented 1 year ago

It looks like the LaTeX CI checks are satisfactory. Can we close this issue as fixed by #3090 and #3100, @tirix ?

tirix commented 1 year ago

Yes, LaTeX checks are up and running.

GinoGiotto commented 1 year ago

I think @david-a-wheeler expressed his wishes for testing some latex proofs as well https://github.com/metamath/set.mm/pull/3100#issuecomment-1475457078 https://github.com/metamath/set.mm/pull/3074#issuecomment-1450870324, but I guess it can be addressed on a separate occasion and it would require to fix https://github.com/metamath/metamath-exe/issues/129 first anyway.