Closed GinoGiotto closed 8 months ago
By the way, this proposal from wolf might not be a bad idea:
Given that an unnecessary reference to ax-13 is a magnitude more likely than to, say, ax-11, what do you think of extending the $j annotation to mark a deliberate use of ax-13?
$j need ax-13 $.
for example. And a verifier that checks the simultaneous presence of ax-13 in the axiom list and this annotation.
Only thing to note is that currently about 1300 theorems use ax-13, so currently we would have to add that number of these tags. However I'm confident that I can bring ax-13 usage below 1000, and maybe I can push it really low with extra effort, but even in the best case scenario they will still be hundreds of theorems.
In my opinion the "discouraged" mechanism which already exists can be used for discouraging new usage of theorems that make use of ax-13
.
This does not mean that it won't be possible to use those theorems, it's just that it will not be possible to use them without noticing.
I sense two competing ideas in this topic: One strives for minimizing axiom usage, not at all costs, but at costs, and the other aims at minimizing prerequisites for theorems, at the expense of using more axioms. The axiom at question here is ax-13, which in essence states that a term x β y implies the distinctness of the variables x and y. This axiom is not disputed. Unlike the axiom of choice it has no ring of doubt about it. So what is wrong with it that its usage should be discouraged?
It seems that many contributors to Metamath are fond of introducing distinct variable conditions without further consideration. They are easy to use, you don't need to quote, or even worse, derive, such a condition when it is applied in a proof step, unlike β²π₯
for example. Dummy variables provide even more automation. And it seems that higher levels of mathematics can be formalized sufficiently even if unnecessary distinct variable conditions are introduced.
While I can see these advantages, it is still just a style, and a style that silently depends on side conditions, or prerequisites. And minimizing such conditions is something mathematicians can be fond of, too. I don't like the idea to put them at a disadvantage.
To illustrate this let me point you to a real world example. Let's have a look bnj1018. You can see a plethora of distinct variable conditions listed there. Among these, one (to my knowledge) is dispensable: n and p need not be disjoint. You can simply remove this condition, and see that the proof verifier sees no problem. If you keep this condition, then you can drop ax-13 from its proof, by modifying bnj985. Add the p, n disjointness there, and you can replace sb8e with sb8ev, a theorem not depending on ax-13.
So what is the right way here? ax-13 vs. an extra distinctness condition on n and p? You have the choice, and from my point of view each one has its own right to exist.
The mechanism of discouraging is explained in conventions. It covers ALT theorems, and the introduction of artificial axioms for complex numbers. I don't find a justification for enforcing a specific style.
I sense two competing ideas in this topic: One strives for minimizing axiom usage, not at all costs, but at costs, and the other aims at minimizing prerequisites for theorems, at the expense of using more axioms.
I'm glad you brought up this topic, I have a few words to say about this.
Let's have a look bnj1018. You can see a plethora of distinct variable conditions listed there. Among these, one (to my knowledge) is dispensable: n and p need not be disjoint...
This is the same question I asked myself before starting the series. The example you mention is appropriate, we can currently remove the dv condition between n and p in bnj1018, but if it would depend on a weaker version of bnj985 then we could not anymore. So either we remove a dv condition or we remove an axiom, we can't remove both. Therefore we have a conflict between two goals.
So how do we determine the best direction?
Well, I do have an answer for that, and the result might surprise you. We can take advantage of Jerry James's amazing tool that automatically checks for all unnecessary dv conditions in the datatabase.
Let's look at commit e468a4e5d2a63f128c20b0334bc486875ef4a0e7, which is the start of the PR series about ax-13.
These are the data for that time:
Now let's take a look at the data from the most recent commit 2dc7de4, which dates back to yesterday.
These are the current numbers:
I'm not sure if I'm properly conveying how impressive these data are. Essentially, they indicate that the cost of lowering ax-13 usage by 96% has been negligible so far (the close proximity between the dv numbers is probably luck, I'm guessing they fluctuate by a few units, but still impressive).
So, my answer to your concern is: we can have both. We can have a database with low ax-13 usage and with less dv conditions. Not only that, the conflict between these two goals is surprisingly weak (however a tool is needed for the second goal, because removing 14,406 dv conditions by hand is not practical).
But it gets better: the number 14,406 is actually a lower bound, we can likely remove much more dv conditions than that. Why? Because once you remove a dv condition from a theorem, then Jerry James's tool can detect the improvement on the ones depending on it (the same thing happens in the opposite direction: if you add a dv condition on a theorem then the verifier will complain about the dv conditions of the ones directly dependent on it).
I really appreciate your work, @GinoGiotto . It has clearly shown that ax-13 has the smallest range of effect of all axioms. Far smaller than I had expected a year ago. It even might be that you don't need this axiom at all to express mathematics at higher levels (i.e. apart from some results in predicate logic and basic set theory).
That all said, it does not solve the conflict about bnj1018, that could appear in two variants. I guess you'd vote for the one with more distinct variable (dv) conditions, but this is a question of preference. You state we can have both
. That's my standpoint, exactly. But your argument appears in a sort of statistical context. Applying the right tools considerably reduce dv conditions along with the usage of ax-13. I have a more principal attitude.
When I joined the list of contributors to Metamath in 2012, people were busy developing ideas about distinctors Β¬ βπ₯ π₯ = π¦
that are a possible (but not very effective) means to eliminate dv conditions. So I have seen times where elimination of these was clearly followed. OK times changed, and we have a different situation now. That does not mean there will be never a backlash again. I am not in favor of closing this door, just for the momentary trend.
There are simpler ways to support ax-13 free proofs right out of the box. Appropriate comments to theorems is one such way. Look at chvar and chvarv, for example. Both theorems have a version chvarvv that is not linked to. How is someone unaware of this discussion assumed to find it, if s/he happened to come accross chvar first?
your argument appears in a sort of statistical context.
Yep, my way of thinking is statistical. I believe this is the right approach when we have to deal with huge amount of data and especially when we have to deal with opposing goals (which aren't really opposing here, see below).
That all said, it does not solve the conflict about bnj1018, that could appear in two variants. I guess you'd vote for the one with more distinct variable (dv) conditions, but this is a question of preference. You state we can have both. That's my standpoint, exactly.
The bnj1018 case is not a problem, both versions can be kept (is the dv condition between p and n that important even?), but in any case this would be problematic only if we would have to deal with a lot of these conflicts, but luckily for us that's not the case.
It has clearly shown that ax-13 has the smallest range of effect of all axioms. Far smaller than I had expected a year ago.
Just to really understand how small this is, let's take a look at an extreme example, the branch ax-13-complete in my repository. The base branch of that database contained 32,347 proofs depending on ax-13, while the number of removable dv conditions were 14,406. That number changed to 14,402 after I lowered ax-13 usage to 716 theorems. Note that I'm the only author of the additional commits in there. So the price for eliminating ax-13 almost everywhere was really just 4 dv conditions. 4 out of 14,000. This is very good evidence that the example of bnj1018 is just a sporadic occurrence, not the rule. And I say it again: ax-13-complete is the extreme version of a database leaning towards axiom reduction at the expense of dv eliminations.
Look at chvar and chvarv, for example. Both theorems have a version chvarvv that is not linked to. How is someone unaware of this discussion assumed to find it, if s/he happened to come accross chvar first?
This is already in my to do list, however comments alone are not satisfactory for me because there is lack of a verification system that someone will unawarely produce proofs with them. Ideally, the proof assistant would directly output the reason why the use of a theorem is discouraged. But this is difficult to achieve right now.
An easier alternative could be to change the current mm-exe output:
MM-PA> ASSIGN <step> <label> >>> ?Error: Attempt to assign a statement whose usage is discouraged. >>> Use ASSIGN ... / OVERRIDE if you really want to do this.
To something like this:
MM-PA> ASSIGN <step> <label> >>> ?Error: Attempt to assign a statement whose usage is discouraged. >>> Use ASSIGN ... / OVERRIDE if you really want to do this. >>> Type "SHOW SOURCE <label>" to know why it's discouraged.
The command SHOW SOURCE <label>
in mm-exe shows the comment for the <label>
theorem. Therefore we can write in the source the reason why they are discouraged and make the user informed of the alternative, but this has to pass through a checkpoint, and a discouraged label is one way to build such checkpoint.
In my opinion the "discouraged" mechanism which already exists can be used for discouraging new usage of theorems that make use of
ax-13
.This does not mean that it won't be possible to use those theorems, it's just that it will not be possible to use them without noticing.
I agree with @tirix that the "discouraged" mechanism can and should be used now - it is available and serves the required purpose (to wait for new tools or accordingly modified tool may be a too long time). And it is still accpording to the "Conventions":
If something should only be used in limited ways, it is marked with "(New usage is discouraged.)".
Since there was a kind of agreement to reduce the numer of usages of ax-13 before @GinoGiotto started his series, this agreement should be supported to the best. Therefore, the rule should be to value theorems with more dv conditions over theorems requiring ax-13. Especially since the need for more dv conditions is negligible, as @GinoGiotto has shown.
And maybe it is sufficient to start with a few commonly used theorems using ax-13 to be tagged with "(New usage is discouraged.)". Candidates are at least ~cbvralv (currently ~cbralv is still used 48 times, but ~cbvralvw 301 times) and ~cbvrexv (currently ~cbralv is still used 45 times, but ~cbvralvw 243 times), and @GinoGiotto may be able to propose additional ones. Looking at ~chvar*, the usage is much lower (currently ~chvar, ~chvarv are used 48 times, and ~chvarvv 72 times). Additional tags can be added later (if needed).
After @GinoGiotto has finished his work, it would be helpful to have a list of the theorems still using ax-13 at a central place (maybe a subpage of the metamath homepage).
After @GinoGiotto has finished his work, it would be helpful to have a list of the theorems still using ax-13 at a central place (maybe a subpage of the metamath homepage).
OK, this list can be created by the command sh us ax-13/rec
within metamath.exe (currently 938, including 59 OLD theorems).
Some theorems depending on ax-13 are not used in proofs of other theorems anymore (e.g., ~nfsum, which was replaced by ~nfsumw), so these theorems could be renamed by appending the suffix ALTV (currently (to be) used in mathboxes only, but I see no reason to use it in the main body, too. But this was discussed already elsewhere...) and tagged by "(New usage is discouraged.)".
Furthermore, the comments should be aligned. For example
Current version:
${
$d f m n z k x $. $d f m n z A $. $d f m n z B $.
nfsumw.1 $e |- F/_ x A $.
nfsumw.2 $e |- F/_ x B $.
$( Version of ~ nfsum with a disjoint variable condition, which does not
require ~ ax-13 . (Contributed by Gino Giotto, 24-Feb-2024.) $)
nfsumw $p |- F/_ x sum_ k e. A B $=
... $.
$( $j usage 'nfsumw' avoids 'ax-13'; $)
$}
${
$d f m n z k $. $d f m n z x $. $d f m n z A $. $d f m n z B $.
nfsum.1 $e |- F/_ x A $.
nfsum.2 $e |- F/_ x B $.
$( Bound-variable hypothesis builder for sum: if ` x ` is (effectively) not
free in ` A ` and ` B ` , it is not free in ` sum_ k e. A B ` .
(Contributed by NM, 11-Dec-2005.) (Revised by Mario Carneiro,
13-Jun-2019.) $)
nfsum $p |- F/_ x sum_ k e. A B $=
... $.
$}
New version:
${
$d f m n z k x $. $d f m n z A $. $d f m n z B $.
nfsumw.1 $e |- F/_ x A $.
nfsumw.2 $e |- F/_ x B $.
$( Bound-variable hypothesis builder for sum: if ` x ` is (effectively) not
free in ` A ` and ` B ` , it is not free in ` sum_ k e. A B ` .
(Contributed by NM, 11-Dec-2005.) (Revised by Mario Carneiro,
13-Jun-2019.) Version of ~ nfsum with a disjoint variable condition, which does not
require ~ ax-13 . (Revised by Gino Giotto, 24-Feb-2024.) $)
nfsumw $p |- F/_ x sum_ k e. A B $=
... $.
$( $j usage 'nfsumw' avoids 'ax-13'; $)
$}
${
$d f m n z k $. $d f m n z x $. $d f m n z A $. $d f m n z B $.
nfsumALTV.1 $e |- F/_ x A $.
nfsumALTV.2 $e |- F/_ x B $.
$( Bound-variable hypothesis builder for sum: if ` x ` is (effectively) not
free in ` A ` and ` B ` , it is not free in ` sum_ k e. A B ` .
Version of ~ nfsumw with fewer disjoint variable conditions, but
requiring ~ ax-13 .
(Contributed by NM, 11-Dec-2005.) (Revised by Mario Carneiro,
13-Jun-2019.) (New usage is discouraged.) $)
nfsumALTV $p |- F/_ x sum_ k e. A B $=
... $.
$}
I agree with @tirix that the "discouraged" mechanism can and should be used now
I added discouragement tags for all ax-13 dependent theorems in main https://github.com/metamath/set.mm/pull/3896.
OK, this list can be created by the command sh us ax-13/rec within metamath.exe (currently 938, including 59 OLD theorems).
This is outdated, the most recent commit ade85ef7b9fa916dcbd3ee80d920f9f04345b235 shows 674 theorems using ax-13.
The role of axc11r.
When removing ax-13 rigorously, you have to look at the axiom ax-12 as well. This axiom has two applications: one is ax12v, a weakening of this axiom from which we want to prove theorems based on this axiom. The other is axc11r. axc11r is directly derived from ax-12, and not from ax12v as other theorems are. If we want to derive theorems only from ax12v, the proof of axc11r should be rewritten to use ax12 instead, in the same manner as we do not use ax-6 directly, instead of ax6. So axc11r has an implicit reference to ax-13. Fortunately almost all applications of this theorem use ax-10, ax-12 and ax-13 anyway, so we can base axc11r on ax12 (see my contribution wl-axc11r). There is just one theorem that drags in these axioms although not necessary. Replace its proof with my contribution wl-dral1v, so it becomes merely dependent on ax12v (and hence ax-12) again.
EDIT: ax12 has a proof that uses axc11r, so there is a circle in the proof idea. EDIT: Suggest to discourage the use of axc11r, and base theorems using it on wl-axc11r instead.
Some theorems depending on ax-13 are not used in proofs of other theorems anymore (e.g., ~nfsum, which was replaced by ~nfsumw), so these theorems could be renamed by appending the suffix ALTV
About this, I don't have a strong opinion about whether ALTV should be allowed in main, but I remember that some people opposed it. The old conventions used to say "Alternate (*ALTV) theorems or definitions are usually contained in mathboxes". When this was noticed, a convention change was done by @benjub in October 2023 to explicitly forbid ALTV in main. Since that convention change was merged, it should mean that this decision was agreed by others.
So this proposal would imply to change the conventions again and populate main with the first ALTV versions (currently there are none).
EDIT: Suggest to discourage the use of axc11r, and base theorems using it on wl-axc11r instead.
This suggestion would introduce ax-10 and ax-13 to bj-axc11v as well, whose comment says "Version of ~ axc11 with a disjoint variable condition, which does not require ~ ax-13 nor ~ ax-10 ". The original axc11v in main already doesn't use them, so it would be possible to fix this by changing BJ proof. However his version has a "Proof modification is discouraged" tag, so you have to ask @benjub about this.
Except for dral1v (which you already addressed) and bj-axc11v (which is fixable by using axc11rv instead of axc11r), your suggestion doesn't introduce new axioms to other theorems. So it's fine for me.
Yes, bj-axc11v should obviously use axc11rv, which was made for these cases. Looking at the contribution dates, I probably forgot to update it when I introduced axc11rv. Could you please make that change?
Yes, bj-axc11v should obviously use axc11rv, which was made for these cases. Looking at the contribution dates, I probably forgot to update it when I introduced axc11rv. Could you please make that change?
A system to automatically avoid non-deliberate ax-13 usage is now implemented. Additionally, $j tags, comments and cross references have been added. I think this properly addresses this issue, so I'm closing it. Feel free to open new ones if you think there are other topics to address.
This question arised while reviewing #3877. Checkout the discussion from https://github.com/metamath/set.mm/pull/3877#discussion_r1527569500 for more context.
For this issue description I'm going to simply quote myself since what I wrote already effectively communicates my intentions:
To this I'll add that we also have the tags
$j usage '<theorem>' avoids 'ax-13'
, which can be added to a few early theorems (I'll do this soon). While they help a little if put in strategic places, they don't communicate anything over newly added theorems (unless the newly added theorems come with their own $j tags).