metamath / metamath-exe

Metamath program - source code for the Metamath executable
GNU General Public License v2.0
77 stars 25 forks source link

Improve minimize performance #94

Open savask opened 2 years ago

savask commented 2 years ago

This is an implementation of the idea presented in #10 . One precomputes the set of math constants used in the proof of the minimized statement (it is stored in .tmp field of g_MathToken in the code), and for each statement we check if all its constants lie in this set. Statements which pass this validation step are marked with 1 in the boolean array usefulStatements. Then the usual back-and-forth minimize_with loop is run, but now it checks only "useful statements".

If minimize_with is supplied with the /VERBOSE option, it will output the number of potentially useful theorems at the beginning.

This optimization reduces the minimization time twofold. For example, on my computer smfsupmpt requires 94s to minimize with the vanilla version, and 42s with this optimization; ssmapsn requires 32s without and 13s with the optimization.

One downside of my code is the allocation of the usefulStatements array, I would prefer it to be an expandable vector (not a boolean array), but as far as I understand metamath codebase doesn't have a vector implementation.

benjub commented 2 years ago

I'm not sure I understand. It may be the case that a proof is shortened by statements using constants not appearing in the current proof. Maybe you know in advance that such shortenings will not be caught by the current minimize_with program, so you're saying we might as well discard them from the beginning ?

savask commented 2 years ago

Maybe you know in advance that such shortenings will not be caught by the current minimize_with program, so you're saying we might as well discard them from the beginning ?

Yes, minimize_with X tries to use X at one of the steps of the proof, so naturally all constants used in X must be already present in the proof of our statement. I can imagine a multi-step subproof using some new constants yet shortening the proof of the original statement, but minimize_with can't find those, unfortunately.

david-a-wheeler commented 2 years ago

metamath codebase doesn't have a vector implementation.

It's certainly possible to add one. The usual approach is a simple data structure with a count of the "number of elements" and a pointer to the (beginning of the) array elements; if not NULL it needs have been allocated (e.g., with malloc). To resize, call realloc() to resize the array, move things as necessary, and change the count. I did a quick Google search and found this one, implemented as OSS: https://github.com/eteran/c-vector

wlammen commented 2 years ago

https://github.com/eteran/c-vector And it is C89 (ANSI C https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) compatible. I wonder whether we should give up on this and at least advance to C11.

david-a-wheeler commented 2 years ago

I wonder whether we should give up on this and at least advance to C11.

We could move to C11 if that's helpful. The key is compiler support in deployed systems. GCC seems to have good C11 support.

However, I don't think that helps. C11 doesn't have a vector type as far as I know, and https://github.com/eteran/c-vector doesn't require C11. We should write code that is compatible with C11, but I think we should only require C11 if there's advantage to it. I didn't see any advantages in this list of C11 additions

wlammen commented 2 years ago

However, I don't think that helps.

See for example https://en.cppreference.com/w/c/io/fprintf to see how much C has developed since 1989. Functions like snprintf really make sense. Or // style comments.

gcc -O3 -funroll-loops -finline-functions -fomit-frame-pointer -DINLINE=inline -g -Wall -Wextra -o metamath metamath.o mmcmdl.o mmcmds.o mmdata.o mmfatl.o mmhlpa.o mmhlpb.o mminou.o mmpars.o mmpfas.o mmunif.o mmveri.o mmvstr.o mmword.o mmwtex.o

This is the command executed by the build.sh script. It does not check for ANSI C. The option -ansi should be added to enforce this standard. By the way, this can be done by changing line 4567 of configure to new_CFLAGS="-Wall -Wextra -ansi". This reveals a lot of ANSI violations like for (long i = 0; i < len; i++): error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode in mmcmdl.c for example. In short, the current state is not ANSI at all.

david-a-wheeler commented 2 years ago

I agree snprintf and // are useful, but I those are C99 additions, not C11.

I'm not opposed to switching to a later C spec, I just want to make sure that a spec move is justified and that it is widely implemented by commonly-available & distributed compilers.

We should at least move up to C99, no question in my mind!!! C99 is very widely supported & I think we're already using its facilities. The only question is if we should move to C11 or not.

digama0 commented 2 years ago

We are already on C11 C99. The build script uses it and // can be spotted in the source code.

savask commented 2 years ago

It's certainly possible to add one.

It is, but is it worth doing so for one application? Also, metamath-exe uses its own memory management system, for example, in code I used poolMalloc instead of malloc and poolFree instead of free. I was wondering if nmbrString can be used as vector of numbers, but I haven't been able to find a push_back equivalent in mmdata.c.

wlammen commented 2 years ago

We are already on C99.

Indeed: CONTRIBUTIONS.md states this literally. I added the -std=c99 flag to configure as indicated in my previous post, and the compiling performed flawlessly, even no warning. To avoid using the results of a previous build without this flag, run build.sh with option -c

benjub commented 2 years ago

Having @wlammen's approval is already a pretty good guarentee ! If @digama0 can find some time to review and if ok merge this, this would be nice and this would also unblock #93.