php / doc-base

Tools for the PHP documentation
Other
347 stars 88 forks source link

Avoid marking files as outdated on multiple `[skip-revcheck]` #178

Closed alfsb closed 6 days ago

alfsb commented 1 week ago

Allow me to explain this curious PR title. Previously I tried to show an issue with consecutive use of [skip-revcheck] in terms of code, today I will try to explain in terms of the real case.

If you run git log -5 --oneline -- language/context/ftp.xml on doc-en, you will see something like this:

099db652fe [skip-revcheck] context/ftp.xml: add final stop for end of sentence (#3490)
8bc832a464 [skip-revcheck] Remove stream wrappers and context options from function method listing (#3422)
bad50150f7 [skip-revcheck] Fix whitespace in stream wrapper and context options files (#3432)
ec6e871a47 [skip-revcheck] Replace role="noversion" attribute usage with the new annotations one
86e6094e86 Use canonical type names

That is, the output will show a run of [skip-revcheck] commits. Runs like this may occur in big, coordinated projects in all translations, and may occur in piecewise independent commits. In the case above, both hypotheses occur.

For updates on doc-en that generates changes that do not need the attention of translations, a commit message can be annotated with [skip-revcheck], and so translators are undisturbed. But there is a catch. It only works if there are no runs like above.

The current [skip-revcheck] code compares at most two commits hashes to determine if a file is synced or outdated on translations. If the last commit has no [skip-revcheck], only the last hash is used; if it contains [skip-revcheck], then compares with the last two hashes. So if there are more than one [skip-revcheck] in recent history, files are marked outdated. Let's examine a detailed example of why this occurs.

Suppose a file has the last normal commit n0, and is synced in all translations (revtags filled with n0). A small en typo commit arrives, that is committed with [skip-revcheck]. That is ok and will work. This file will not be marked outdated in all translations. This is what the code inspect:

$ git log -- file
s1   [skip-revcheck]      // hash one, compare and fail, continue.
n0   Normal message.      // hast two, compare and succeeds -> status: TranslatedOK

Now another commit is pushed, and by coincidence, it is also marked [skip-revcheck].. Now, this file will be marked outdated in all translations.

$ git log -- file
s2   [skip-revcheck]      // hash one, compare and fail, continue.
s1   [skip-revcheck]      // hast two, compare and fail -> status: TranslatedOLD
n0   Normal message.      // ignored

This PR changes the behavior of [skip-revcheck]. Instead only at most two hashes, it will consider more hashes, or more specifically, it will check the last hash as usual, and all hashes of topmost uninterrupted [skip-revcheck] runs, no matter how long they are.


This PR does not cause any regressions of TranslatedOk to TranslatedOld in all languages tested, but will change some status from TranslatedOld to TranslatedOk on doc-es. (this is expected, in translations with big backlogs).


This PR also speeds up revcheck.php and genrevdb.php five fold. Local tests on running genrevdb.php on de,en,es,fr,it,ja,pl,pt_BR,ro,ru,tr,uk,zh completes in 5m25s and 1m02s, on current and PR code, respectively.

Most local revcheck.php now runs below 4 seconds.


This is a "fix" for the FIXED_SKIP_REVCHECK backport. Reviews and comments welcome. I plan to merge this PR in about two weeks.

Girgias commented 1 week ago

The logic makes sense to me, but I have no idea how the revcheck script works to be frank :)

alfsb commented 1 week ago

The logic makes sense to me, but I have no idea how the revcheck script works to be frank :)

Unfortunately, I think few people do. It's not even the first time I'm rewriting revcheck.php (first one was CVS to SVN repo change). But I hope the new code may change this a bit. This entire PR is about one of the effects of last rewriting, trying to refactor the code into separated concepts, to facilitate the understanding and interaction of these separate concepts.

There was [skip-revcheck], isolated evaluated, now are [skip-revcheck] runs.

But for my surprise, this changes has no negative impacts. I would do like more tests, but as it stands now, I will merge this after waiting a cool down from last merge, to avoid compounding unexpected problems.

alfsb commented 6 days ago

Merged. Let me know if are any issues. This fixes the second issue at https://github.com/php/doc-base/pull/132, and now two or more sequential [skip-revcheck] have the same effect as an unique [skip-revcheck].