php / doc-base

Tools for the PHP documentation
Other
347 stars 88 forks source link

Single source revcheck code/data #174

Closed alfsb closed 1 week ago

alfsb commented 3 weeks ago

This PR is a work in progress to concentrate revcheck code and generation of revcheck data into a single place.

In doc-base (this PR):

In web-doc (other PRs)

This PR does not try to fix the issues reported on https://github.com/php/doc-base/issues/133 or https://github.com/php/doc-base/pull/132, for now. The priority is removing code duplication, and after that to possibly change revcheck semantics of [skip-revcheck] (and other possible issues)

Reviews are welcome. Cosmetic changes delayed in favor of resolving issues found.

jimwins commented 2 weeks ago

@jimwins as you have been working on doc-web, mind keeping an eye on this?

Yes, I'm ready to take a look when it is past the draft stage, and move over the systems side of doc.php.net to stop running web-doc/scripts/rev.php and switch it over to doc-base/scripts/revcheck.php (or configure.php) with the necessary configuration to generate the SQLite database or whatever that web-docs/www/rev.php will read.

alfsb commented 2 weeks ago

Yes, I'm ready to take a look when it is past the draft stage, and move over the systems side of doc.php.net to stop running web-doc/scripts/rev.php and switch it over to doc-base/scripts/revcheck.php

If switching over to doc-base/scripts/revcheck.php is possible, it could avoid the last step, of serializing data in a secure format, and directly use more code from this PR, that I rewrote using the popos instead arrays (to facilitate debug).

alfsb commented 2 weeks ago

I finished the rewriting of doc-base/scripts/revcheck.php, with back ported counting and listing behaviors to match actual code, and so facilitate migration. After all revcheck code (or data) becoming single sourced, I will then try open separated GH issues for some cases I found while rewriting, including PR 132.

There are two cosmetic modifications that i would like to make, namely:

This won't change any data or structure, but will make the generated HTML differ, and so I may wait for web-doc code sync or the merging of this PR, before making these changes.

jimwins commented 2 weeks ago

Seems fine to me. What I'd like to see is a way to pass a list of languages to check to doc-base/scripts/revcheck.php, not just one, and an argument to output an SQLite database with the information as currently done by web-doc/scripts/rev.php (and not output any HTML, of course).

alfsb commented 2 weeks ago

That would be a partial rewrite of web-doc/scripts/rev.php, to both eliminate the code duplication and to generate the same database output as currently implemented. The base data generating code can be as minimal as:

require_once "path/to/doc-base/scripts/translation/lib/all.php";
foreach( $langs as $lang ) {
    $revcheck = new RevcheckRun( $lang );
    generateSql( $revcheck->revData );
}

By the way, the database output appears to me to be overly complicated, using 8 tables (and some indexes) where is only really necessary 3 tables: languages, translations and files. I propose to write and submit a PR for an alternative web-doc/scripts/revdb.php that does the rewrite and simplification.

After this alternative is installed, running and generating a simplified db, it would only be necessary to modify the SELECTs on HTML generating pages to use the new db/tables, with no joining necessary. For this last part I ask someone else to volunteer, before starting this alternative revdb.php.

jimwins commented 2 weeks ago

The script for generating the database should live in the doc-base repo because we will not do cross-repo code dependencies. I have no problem with simplifying the schema, and will handle updating the display code in web-doc.

alfsb commented 2 weeks ago

So, instead of cross-repo code including, I can write an doc-base/scripts/translations/genrevdb.php to generate an Sqlite file from a language list.  That would be acceptable?

The proposed syntax would be: genrevdb.php [relative/path/file.db] [langdir1,...,langdirN]

Invoked from a directory above $LANG checkouts, that is afaik the same convention of doc-base/configure.php.

jimwins commented 2 weeks ago

That sounds good. But the path to the SQLite database may be an absolute path, too.

alfsb commented 2 weeks ago

Pushed with a new script named doc-base/scripts/translation/genrevdb.php. Tested in one repository, and in the following days I will test the new revcheck.php and genrevdb.php in all translations listed CI for automatic testing.

Reviews and tests are welcome. Runs in about 28s on 14y old CPU and SSD for pt_BR repo.

jimwins commented 2 weeks ago

Haven't played around with the resulting database and might not have time to do today, but it looks like it works for me, at least.

It would be preferable to specify languages as individual arguments instead of one argument that you explode on ,.

One thing I did notice is that the intros in the languages table are wrapped in the <intro> tag.

$ php doc-base/scripts/translation/genrevdb.php out.sqlite de,es,fr,it,ja,pl,pt_br,ro,ru,tr,uk,zh
[2024-11-07 18:54] Creating revdata database out.sqlite for languages de,es,fr,it,ja,pl,pt_br,ro,ru,tr,uk,zh.
[2024-11-07 18:54] Language de run
[2024-11-07 18:55] Language de done (elapsed 19.00s)
[2024-11-07 18:55] Language es run
[2024-11-07 18:56] Language es done (elapsed 68.00s)
[2024-11-07 18:56] Language fr run
[2024-11-07 18:57] Language fr done (elapsed 58.00s)
[2024-11-07 18:57] Language it run
[2024-11-07 18:57] Language it done (elapsed 5.00s)
[2024-11-07 18:57] Language ja run
[2024-11-07 18:58] Language ja done (elapsed 45.00s)
[2024-11-07 18:58] Language pl run
[2024-11-07 18:58] Language pl done (elapsed 4.00s)
[2024-11-07 18:58] Language pt_br run
[2024-11-07 18:58] Language pt_br done (elapsed 30.00s)
[2024-11-07 18:58] Language ro run
[2024-11-07 18:58] Language ro done (elapsed 4.00s)
[2024-11-07 18:58] Language ru run
[2024-11-07 19:00] Language ru done (elapsed 80.00s)
[2024-11-07 19:00] Language tr run
[2024-11-07 19:00] Language tr done (elapsed 17.00s)
[2024-11-07 19:00] Language uk run
[2024-11-07 19:00] Language uk done (elapsed 3.00s)
[2024-11-07 19:00] Language zh run
[2024-11-07 19:00] Language zh done (elapsed 24.00s)
[2024-11-07 19:00] Revdata database out.sqlite complete.
alfsb commented 2 weeks ago

It would be preferable to specify languages as individual arguments instead of one argument that you explode on ,.

Changed.

One thing I did notice is that the intros in the languages table are wrapped in the <intro> tag.

It does not affect rendering, but is certainly invalid HTML. Working with DOMDocumentFragment or detaching root-less XML is always awkward. Piecewise XML node saving is no prettier, but generates way less code. Fixed.

alfsb commented 2 weeks ago

I am now confident this PR produces identical results in most cases, and near identical in a few cases, comparing counts and listings of old and lib versions of revcheck.

For example, ro revcheck counts 20 notinen files, but lists 21. The code that extracts hashes, determines files final status and count cases are spread in so many places that it is impossible to make further progress. But as doc-base and web-doc revchecks are run with different times (and checkout states), some differences are expected.

And with other PRs in place (listed above), I think it's time to push for this. I plan to:

  1. Merge this PR in the following week, if there are no objections;
  2. In the meantime, to open various separated issues some minor, some major, that changes revcheck results;
  3. The minor issues to be pushed separately;
  4. Two major issues, dealing with [skip-revcheck], to be pushed around early December.

This new "lib revcheck" code is somewhat future proofed, in the sense that the modifications above only generate exchanged data, and no further changes in database or HTML generating would be necessary.

alfsb commented 1 week ago

Merged. Further issues/PRs will consider this as baseline. Is necessary to comment/notify the PRs of system and web-doc, above?

jimwins commented 1 week ago

All handled now, new data is live on https://doc.php.net/.

alfsb commented 1 week ago

Yay! Let's celebrate. No more duplicated revcheck code in various repositories.

I will treat myself with a good wine tonight.

alfsb commented 1 week ago

I spoke to soon. The pt_BR translation is not shown, other are ok. See: https://doc.php.net/revcheck.php?lang=pt_br