Closed alfsb closed 1 week ago
@jimwins as you have been working on doc-web, mind keeping an eye on this?
Yes, I'm ready to take a look when it is past the draft stage, and move over the systems side of doc.php.net to stop running web-doc/scripts/rev.php
and switch it over to doc-base/scripts/revcheck.php
(or configure.php
) with the necessary configuration to generate the SQLite database or whatever that web-docs/www/rev.php
will read.
Yes, I'm ready to take a look when it is past the draft stage, and move over the systems side of doc.php.net to stop running
web-doc/scripts/rev.php
and switch it over todoc-base/scripts/revcheck.php
If switching over to doc-base/scripts/revcheck.php
is possible, it could avoid the last step, of serializing data in a secure format, and directly use more code from this PR, that I rewrote using the popos instead arrays (to facilitate debug).
I finished the rewriting of doc-base/scripts/revcheck.php
, with back ported counting and listing behaviors to match actual code, and so facilitate migration. After all revcheck code (or data) becoming single sourced, I will then try open separated GH issues for some cases I found while rewriting, including PR 132.
There are two cosmetic modifications that i would like to make, namely:
This won't change any data or structure, but will make the generated HTML differ, and so I may wait for web-doc
code sync or the merging of this PR, before making these changes.
Seems fine to me. What I'd like to see is a way to pass a list of languages to check to doc-base/scripts/revcheck.php
, not just one, and an argument to output an SQLite database with the information as currently done by web-doc/scripts/rev.php
(and not output any HTML, of course).
That would be a partial rewrite of web-doc/scripts/rev.php
, to both eliminate the code duplication and to generate the same database output as currently implemented. The base data generating code can be as minimal as:
require_once "path/to/doc-base/scripts/translation/lib/all.php";
foreach( $langs as $lang ) {
$revcheck = new RevcheckRun( $lang );
generateSql( $revcheck->revData );
}
By the way, the database output appears to me to be overly complicated, using 8 tables (and some indexes) where is only really necessary 3 tables: languages
, translations
and files
. I propose to write and submit a PR for an alternative web-doc/scripts/revdb.php
that does the rewrite and simplification.
After this alternative is installed, running and generating a simplified db, it would only be necessary to modify the SELECTs on HTML generating pages to use the new db/tables, with no joining necessary. For this last part I ask someone else to volunteer, before starting this alternative revdb.php
.
The script for generating the database should live in the doc-base
repo because we will not do cross-repo code dependencies. I have no problem with simplifying the schema, and will handle updating the display code in web-doc
.
So, instead of cross-repo code including, I can write an doc-base/scripts/translations/genrevdb.php
to generate an Sqlite file from a language list. That would be acceptable?
The proposed syntax would be: genrevdb.php [relative/path/file.db] [langdir1,...,langdirN]
Invoked from a directory above $LANG checkouts, that is afaik the same convention of doc-base/configure.php
.
That sounds good. But the path to the SQLite database may be an absolute path, too.
Pushed with a new script named doc-base/scripts/translation/genrevdb.php
. Tested in one repository, and in the following days I will test the new revcheck.php
and genrevdb.php
in all translations listed CI for automatic testing.
Reviews and tests are welcome. Runs in about 28s on 14y old CPU and SSD for pt_BR
repo.
Haven't played around with the resulting database and might not have time to do today, but it looks like it works for me, at least.
It would be preferable to specify languages as individual arguments instead of one argument that you explode on ,
.
One thing I did notice is that the intros in the languages table are wrapped in the <intro>
tag.
$ php doc-base/scripts/translation/genrevdb.php out.sqlite de,es,fr,it,ja,pl,pt_br,ro,ru,tr,uk,zh
[2024-11-07 18:54] Creating revdata database out.sqlite for languages de,es,fr,it,ja,pl,pt_br,ro,ru,tr,uk,zh.
[2024-11-07 18:54] Language de run
[2024-11-07 18:55] Language de done (elapsed 19.00s)
[2024-11-07 18:55] Language es run
[2024-11-07 18:56] Language es done (elapsed 68.00s)
[2024-11-07 18:56] Language fr run
[2024-11-07 18:57] Language fr done (elapsed 58.00s)
[2024-11-07 18:57] Language it run
[2024-11-07 18:57] Language it done (elapsed 5.00s)
[2024-11-07 18:57] Language ja run
[2024-11-07 18:58] Language ja done (elapsed 45.00s)
[2024-11-07 18:58] Language pl run
[2024-11-07 18:58] Language pl done (elapsed 4.00s)
[2024-11-07 18:58] Language pt_br run
[2024-11-07 18:58] Language pt_br done (elapsed 30.00s)
[2024-11-07 18:58] Language ro run
[2024-11-07 18:58] Language ro done (elapsed 4.00s)
[2024-11-07 18:58] Language ru run
[2024-11-07 19:00] Language ru done (elapsed 80.00s)
[2024-11-07 19:00] Language tr run
[2024-11-07 19:00] Language tr done (elapsed 17.00s)
[2024-11-07 19:00] Language uk run
[2024-11-07 19:00] Language uk done (elapsed 3.00s)
[2024-11-07 19:00] Language zh run
[2024-11-07 19:00] Language zh done (elapsed 24.00s)
[2024-11-07 19:00] Revdata database out.sqlite complete.
It would be preferable to specify languages as individual arguments instead of one argument that you explode on
,
.
Changed.
One thing I did notice is that the intros in the languages table are wrapped in the
<intro>
tag.
It does not affect rendering, but is certainly invalid HTML. Working with DOMDocumentFragment or detaching root-less XML is always awkward. Piecewise XML node saving is no prettier, but generates way less code. Fixed.
I am now confident this PR produces identical results in most cases, and near identical in a few cases, comparing counts and listings of old and lib versions of revcheck.
For example, ro
revcheck counts 20 notinen
files, but lists 21. The code that extracts hashes, determines files final status and count cases are spread in so many places that it is impossible to make further progress. But as doc-base and web-doc revchecks are run with different times (and checkout states), some differences are expected.
And with other PRs in place (listed above), I think it's time to push for this. I plan to:
[skip-revcheck]
, to be pushed around early December.This new "lib revcheck" code is somewhat future proofed, in the sense that the modifications above only generate exchanged data, and no further changes in database or HTML generating would be necessary.
Merged. Further issues/PRs will consider this as baseline. Is necessary to comment/notify the PRs of system and web-doc, above?
All handled now, new data is live on https://doc.php.net/.
Yay! Let's celebrate. No more duplicated revcheck code in various repositories.
I will treat myself with a good wine tonight.
I spoke to soon. The pt_BR translation is not shown, other are ok. See: https://doc.php.net/revcheck.php?lang=pt_br
This PR is a work in progress to concentrate revcheck code and generation of revcheck data into a single place.
In
doc-base
(this PR):doc-base/scripts/revcheck.php
to use revcheck lib;doc-base/scripts/translations/genrevdb.php
. (proposed)In
web-doc
(other PRs)genrevdb.php
; (systens 54)web-doc/scripts/rev.php
. (ditto)This PR does not try to fix the issues reported on https://github.com/php/doc-base/issues/133 or https://github.com/php/doc-base/pull/132, for now. The priority is removing code duplication, and after that to possibly change revcheck semantics of
[skip-revcheck]
(and other possible issues)Reviews are welcome. Cosmetic changes delayed in favor of resolving issues found.