Closed tirix closed 1 year ago
@tirix - ChkTeX is an interesting idea. My intended idea was to generate some TeX files, then ask TeX generate pdf files & see if that works. We could do both.
I can provide the code I made if it's useful, there are some issues tho:
In any case I believe this approach is better than writing only a few TeX proofs/statements, since with the latter method you wouldn't check the functionality of all latex definitions.
It's written in Matlab
Does it also run in GNU Octave? If not, could it be easily converted? GNU Octave is pretty compatible for such things, I've used it for some machine learning work.
I agree, a single test that "checks all definitions" makes sense, as long as it doesn't take TOO long.
Does it also run in GNU Octave? If not, could it be easily converted? GNU Octave is pretty compatible for such things, I've used it for some machine learning work.
I don't use Octave so I'm not sure, I know they are generally similar, but there might be some functions that Matlab has and Octave hasn't, as well as some minor language differences.
I agree, a single test that "checks all definitions" makes sense, as long as it doesn't take TOO long.
My code takes about 10 seconds to generate the list, which isn't super fast, I guess someone with more experience than me could make it more efficient.
Let me know if you are interested. It's only 80 lines of code, so I guess I can just share it here.
10 seconds is fine. Yes, interested.
If its only purpose is to test the generated TeX code, the easy solution is to add it to set.mm (say in scripts/), which is also where our CI test code for set.mm is. I don't remember if our base image includes LaTeX.
Ok, so I runned it again right now with Matlab R2021a 64-bit and it works fine.
The following file (collector.txt) is the code, I changed the format to .txt because Github doesn't let me load Matlab files, originally it was 'collector.m', but I don't know if it's the same for Octave.
The following is first line of the code, here you write which Metamath database you want to read, you need to have that file in the same folder as "collector.m". I setted it to read 'set.mm':
fileL = fopen('set.mm'); % Reading file
I downloaded the most recent version of set.mm, and runned the program, this should be the output:
The program generates the file in a .tex format, I changed to .txt for the same reason described above.
Each line corresponds to one latex substitution. I observed that only one of them latexdef "log" as "log";
cannot 'stand alone', meaning that it will produce an error if there isn't anything in front of it. As long as all the theorems that uses it are well formed, this shouldn't be an issue, but it should be appended as an exception for the checker then.
Note: even if you remove log_
in 'list.tex' you will still find that the file produces errors, that's because there are a few malfunctioning latexdefs left to fix (indeed if I remove them it compiles without problems, I will report them soon).
I already added \usepackage{mathrsfs}
to solve the problem mentioned here: https://github.com/metamath/set.mm/pull/3066#discussion_r1118161762, this package is not implemented in Metamath yet, but it will be once this pull request https://github.com/metamath/metamath-exe/pull/130/commits/f5b8c11c10ee090c178183b36eb2a2888ecd4cc4 will be merged.
Here is the Matlab code from @GinoGiotto - are you releasing this to the public domain, releasing under MIT license, something else?
Code:
fileL = fopen('set.mm'); % Reading file
C = textscan(fileL,'%s','delimiter','\n');
fclose(fileL);
C = strtrim(string(C{:})); % This vector contains the whole original Metamath database
n = size(C,1);
i = 1;
t = 1;
new = string.empty; % This vector contains the latex substitutions
% Filling the vector "new"
while i <= n
if ~isempty(regexp(C(i),'latexdef') == 1) && regexp(C(i),'latexdef') == 1
stringa = string(regexp(C(i),'"','split'));
if (length(stringa) == 5)
new(t) = stringa(4);
new(t) = insertBefore(new(t),"\","\");
elseif (length(stringa) == 3)
if stringa(3) == " as"
j = 1;
B = convertStringsToChars(C(i+j));
tot = "";
while B(end) ~= ';'
B = B(2:end-3);
tot = tot + convertCharsToStrings(B);
j = j + 1;
B = convertStringsToChars(C(i+j));
end
B = B(2:end-2);
tot = tot + convertCharsToStrings(B);
new(t) = tot;
new(t) = insertBefore(new(t),"\","\");
else
tex = string(regexp(stringa(3),"'",'split'));
new(t) = tex(2);
new(t) = insertBefore(new(t),"\","\");
end
elseif (length(stringa) == 4)
clear stringa
stringa = string(regexp(C(i),' ','split'));
B2 = convertStringsToChars(stringa(4));
B2 = B2(2:end-2);
new(t) = convertCharsToStrings(B2);
new(t) = insertBefore(new(t),"\","\");
clear B1
clear B2
end
t = t + 1;
end
i = i + 1;
end
% Creation of the tex file
fileID = fopen("list.tex",'w'); % Output file
fprintf(fileID,"\\documentclass{article}\n");
fprintf(fileID,"\\usepackage{graphicx}\n");
fprintf(fileID,"\\usepackage{amssymb}\n");
fprintf(fileID,"\\usepackage{amsmath}\n");
fprintf(fileID,"\\usepackage{amsthm}\n");
fprintf(fileID,"\\usepackage{mathrsfs}\n");
fprintf(fileID,"\\theoremstyle{plain}\n");
fprintf(fileID,"\\newtheorem{theorem}{Theorem}[section]\n");
fprintf(fileID,"\\newtheorem{definition}[theorem]{Definition}\n");
fprintf(fileID,"\\newtheorem{lemma}[theorem]{Lemma}\n");
fprintf(fileID,"\\newtheorem{axiom}{Axiom}\n");
fprintf(fileID,"\\allowdisplaybreaks[1]\n");
fprintf(fileID,"\\usepackage[plainpages=false,pdfpagelabels]{hyperref}\n");
fprintf(fileID,"\\hypersetup{colorlinks}\n");
fprintf(fileID,"\\begin{document}\n");
fprintf(fileID,"\n");
fprintf(fileID,"\\begin{proof}\n");
fprintf(fileID,"\\begin{align}\n");
for i = 1:length(new)
fprintf(fileID, "\\\\" + i + " && & " + new(i) + "\n" );
end
fprintf(fileID,"\\end{align}\n");
fprintf(fileID,"\\end{proof}\n");
fprintf(fileID,"\n");
fprintf(fileID,"\\end{document}");
fclose(fileID);
clear
I would replace amsmath with mathtools in this code (mathtools load amsmath anyway).
@GinoGiotto : for log_, use "\operatorname{\underline{log}}"
(and for "log", I hope the TeX is already "\log").
are you releasing this to the public domain, releasing under MIT license, something else?
To be honest I'm not educated in this field, feel free to do whatever you want with it, I won't complain.
for log_, use
"\operatorname{\underline{log}}"
(and for "log", I hope the TeX is already "\log").
Ok, I noted it in my list.
I would replace amsmath with mathtools in this code (mathtools load amsmath anyway).
Done collector.txt
I made a small optimization. Originally 'collector.m' cycled through the entire Metamath database, but that's inefficient. I edited so that it cycles through the $t comment only. Now it takes about 3 seconds to run.
Could it be that a simple sed
script would get us the same result?
Maybe something like:
sed -n -E 's/^ *latexdef "([^"]+)" as "([^"]+)";/\\begin{verbatim}\1\\end{verbatim} \& \2/p' set.mm
Which outputs something like:
\begin{verbatim}|-\end{verbatim} & \vdash
\begin{verbatim}ph\end{verbatim} & \varphi
\begin{verbatim}ps\end{verbatim} & \psi
Then we only have to add the prelude/header/footer.
There are some latexdefs using single quotes, but this should be manageable.
Add \\
on each line of the resulting file.
@tirix Your approach looks much better than mine. Does it work for latexdefs spanning over multiple lines? I'm very interested in more efficient TeX collectors.
To be honest I'm not educated in this field, feel free to do whatever you want with it, I won't complain.
Okay, I'm interpreting "do whatever you want with it" as a CC0 dedication: https://creativecommons.org/publicdomain/zero/1.0/
I think sed would have trouble with the definitions crossing multiple lines. I'd use awk or python.
@david-a-wheeler I updated the comments with the CC0 licenced code files.
I think sed would have trouble with the definitions crossing multiple lines. I'd use awk or python.
The simple pattern above does not work with multiple line, as suggested by @david-a-wheeler we either have to use another tool, or more advanced sed options. Anyway my point was rather that we probably did not need the power of Matlab or Octave for that generation.
I already added
\usepackage{mathrsfs}
to solve the problem mentioned here: #3066 (comment), this package is not implemented in Metamath yet, but it will be once this pull request metamath/metamath-exe@f5b8c11 will be merged.
That PR was moved to https://github.com/metamath/metamath-exe/pull/131 and it has been merged. The preambule now has \usepackage{mathrsfs}
(and also phonetic
and mathtools
).
I think I made some progress. I wrote a shorter version of "collector" in Python, runned it with the Python IDLE and it seems to work. I used regular expressions, which aren't really my strong suit, but I think it looks promising, specifically:
latexdef
within the line."([^']+)"
creates some problems for these cases).The following code prints LaTeX substitutions in the Python shell:
# Creative Commons Zero v1.0 Universal # Permission to use, copy, modify, and/or distribute this software for any # purpose with or without fee is hereby granted. # This code is licensed under CC0 - https://creativecommons.org/publicdomain/zero/1.0/ # Author: Gino Giotto import re with open('set.mm', 'r') as file: text = file.read() pattern = r"latexdef\s+['\"]\S+['\"]\s+as(?P<tex>(?:\s*\n?\s*['\"].+['\"]\s*\+?)+);" matches = re.finditer(pattern, text) i = 0 for match in matches: i = i + 1 tex_part = (match.group("tex")).strip() trimmed = tex_part[1:-1].strip() final = re.sub(r"['\"]\s*\+\n\s*['\"]"," ", trimmed) print("\\\\" + str(i) + " && & " + " " + final)
And the following file is the complete code that generates the TeX list in a Metamath proof format: TeX_collector.txt
This is the output file: list.txt
The only issue is that it probably picks up some commented LaTeX substitutions, so unless there will be improvements, bad latex would be checked in the comments too.
Looks good!
Could you open a PR adding that python script in the scripts
directory? There are already other python scripts there.
In my opinion it would be more useful to have the original metamath ASCII token as a reference (one could use \begin{verbatim}
...\end{verbatim}
in LaTeX), rather than the index, but actually for just a CI check it does not matter, and we can discuss that in the PR, or add it later.
Like this? list.txt I'm not so sure about it, I would prefer to keep the file "pure" to be confident that errors come from TeX substitutions only, but if it's useful for some specific reason I can add it. In any case I would keep the index, I think it's useful to know how many LaTeX definitions it collects and check if this information is consistent with other LaTeX counters (e.g. I noticed that my previous version collected 1 less latexdef).
I'll make the pull request soon.
Like this? list.txt
Yes, this way we can also see the original token, so this can act like the mmascii page, but for LaTeX.
Also, you can name your file list.tex
!
@tirix Nope, indeed I tested it and TeX programs complain about verbatim
being inside align
. Try this simple example list.txt
Could you please provide another format for adding the tokens?
The "tabular" environment is more appropriate than the "align" environment.
Well, I guess the good news is that we are finding & fixing the problems systematically. Once we fix it, and add tests to keep it fixed, it should stay fixed :-).
The "tabular" environment is more appropriate than the "align" environment.
It doesn't work either, it still gives errors for verbatim and it doesn't recognize math formulas. Also in general I would prefer to not change the basic Metamath proof style to have more reliable testing.
I have not done any LaTeX since too many years to count, so I'm afraid I'm not of good advice here...
This StackOverflow answer says you need to declare the column as a paragraph column, have you tried that?
The ASCII math token is a better reference than a running index, but if it does not work, but you can drop the verbatim
idea, it is not necessary to add a CI check on LaTeX definitions.
This StackOverflow answer says you need to declare the column as a paragraph column, have you tried that?
I missed that, thanks. Yes this solution works for placing verbatim
inside tabular
. However there is still the problem that it doesn't compile math formulas (it gives errors like \mathrm is allowed only in math mode
), so I tried to make a math mode environment for each and every latexdef. That compiled, but then I noticed that the output file had the table cut, meaning that if the table is too long, it doesn't produce more pages of it. So the tabular
environment can't be implemented. Then I tried some alternatives to verbatim
inside align
such as lstlisting
but they still don't work. I also tried some workarounds like \texttt{token}
, but they incorrectly compile some special characters like \
.
I was able to make it work using the enumerate
environment, which also has built-in indexing:
\documentclass{article} \usepackage{graphicx} \usepackage{amssymb} \usepackage{mathtools} \usepackage{amsthm} \usepackage{mathrsfs} \begin{document} \begin{enumerate} \item \verb!(! \quad $($ \item \verb!)! \quad $)$ \item \verb!->! \quad $\rightarrow$ \item \verb!&! \quad $\mathrel{\&}$ \item \verb!=>! \quad $\Rightarrow$ \item \verb!-.! \quad $\lnot$ \item \verb!wff! \quad $\mathrm{wff}$ \item \verb!|-! \quad $\vdash$ \item \verb!ph! \quad $\varphi$ \end{enumerate} \end{document}
Which produces the output:
The only problem is that no token should have a !
inside its text. There are also slight compiling differences between math mode of \align
and math mode of $
, specifically $
is a bit more restrictive than \align
and it complains for something like \text{\large\boldmath$\times$}
, while \align
doesn't.
I didn't use \begin{verbatim}
and \begin{align}
here because they aren't inline environments, technically they could be implemented, but the result was quite messy.
This solution has an advantage tho, it's easier to spot malfunctioning latexdefs because in this case TeX editors tell where the errors are.
If you like this proposal we could work from here, otherwise I will just make a pull request of my original solution I guess.
Yes, tabular is merely an environment to write tables, so it does not switch to math mode. You have to write equations as usual by surrounding them with $...$. I'm not sure what your "table cut" problem is, but since the positioning is "floating", pdflatex may have a problem positioning it if it spans more than a page. You can tweak the python code so that every 20 lines, for instance, it adds something like \end{tabular}\begin{tabular}
.
I remember having to fight long tables for the Metamath book (which uses TeX).
i ended up switching to "longtabu" for big tables. This was a big improvement; the big demo table in appendix A could have words flow in a cell AND nicely split across pages. Headers reappear on each page. Hyperlinks work, and cells can have forced line breaks. This works on both normal size and narrow size. Your mileage may vary, but it may be worrth investigating. Check out metamath-book if you want to see an example.
My TeX editors had some trouble with \usepackage{tabu}
, but I was able to find a solution with \usepackage{longtable}
and the \begin{longtable}...\end{longtable}
environment. Now the table is displayed across multiple pages correctly.
LaTeX code:
\documentclass{article} \usepackage{graphicx} \usepackage{amssymb} \usepackage{mathtools} \usepackage{amsthm} \usepackage{mathrsfs} \usepackage{longtable} \begin{document} \begin{longtable}{|l|} 1. \quad \verb!(! \quad $($ \\ 2. \quad \verb!)! \quad $)$ \\ 3. \quad \verb!->! \quad $\rightarrow$ \\ 4. \quad \verb!&! \quad $\mathrel{\&}$ \\ 5. \quad \verb!=>! \quad $\Rightarrow$ \\ 6. \quad \verb!-.! \quad $\lnot$ \\ 7. \quad \verb!wff! \quad $\mathrm{wff}$ \\ 8. \quad \verb!|-! \quad $\vdash$ \\ 9. \quad \verb!ph! \quad $\varphi$ \\ 10. \quad \verb!ps! \quad $\psi$ \\ 11. \quad \verb!ch! \quad $\chi$ \\ \end{longtable} \end{document}
Output:
The width of the table is automatically adjusted depending on the longest latex substitution present in the list. For set.mm it is as wide as in the figure above.
Now you can make a real table: use \begin{longtable}{|l|l|}
, replace \quad
with &
and add \hline
at the beginning and end. Maybe add two column titles in a new first row: \verb!ascii! & \LaTeX\ \\
. Row numbering is probably not needed.
It looks like the LaTeX CI checks are satisfactory. Can we close this issue as fixed by #3090 and #3100, @tirix ?
Yes, LaTeX checks are up and running.
I think @david-a-wheeler expressed his wishes for testing some latex proofs as well https://github.com/metamath/set.mm/pull/3100#issuecomment-1475457078 https://github.com/metamath/set.mm/pull/3074#issuecomment-1450870324, but I guess it can be addressed on a separate occasion and it would require to fix https://github.com/metamath/metamath-exe/issues/129 first anyway.
@david-a-wheeler expressed his wish for LaTeX checks in the CI.
@GinoGiotto suggested to use a tool to generate a TeX file listing all definitions. We can't visually check the result but we could then run some checker like e.g.
ChkTeX
to ensure the generared TeX file is semantically valid.