php / web-doc

http://doc.php.net
33 stars 84 forks source link

Wrong additions and deletions counts #32

Closed lhsazevedo closed 2 years ago

lhsazevedo commented 2 years ago

Description

The outdated files table is reporting wrong change counts. They seem to be broken after 9f67270.

It showing only the changes in the latest commit, while it should be comparing each file against the commit specified by EN-Revision hash from it's translated counterpart.

Example

In the image bellow -2 +5 is reported, while the real counts should be +92 -18 as seen on the diff page.

image

Fix

To get the real changes count, the command below should be used for each outdated file: git diff --numstat <en-revision-hash> -- <file-name>

$ git diff --numstat 41f1e8e0de702594b60d3aef233f3935929c22ef -- reference/datetime/dateinterval/construct.xml
92      18      reference/datetime/dateinterval/construct.xml

I'm not aware of a way of obtaining the counts for all files in just shot, as the hashes differ from file to file.

lhsazevedo commented 2 years ago

I tried to simplify the main language loop in terms of readability and maintainability and came up with the result bellow. I find it easier understand and safer to modify. It is longer indeed, but maybe it's better to extract functions than to compress a lots conditions in a few lines.

https://github.com/lhsazevedo/php-web-doc/blob/e2efd161435f0778d478ef51eaa7a94fe140146a/scripts/rev.php#L513-L571

Please tell me what you think.

nilgun commented 2 years ago

My goal was to reduce the generation time of the database. The change you made was increasing the time to 80 seconds and git was crashing some files (possibly due to migration to github). I tried to revert to the old value (15 seconds) and get rid of the errors. Your new suggestion will increase the time even more (2 git commands instead of 1 git command per file). For now, returning to 80 seconds should suffice.

Lucas Henrique @.***>, 19 Oca 2022 Çar, 08:29 tarihinde şunu yazdı:

I tried to simplify the main language loop in terms of readability and maintainability and came up with the result bellow. I find it easier understand and safer to modify. It is longer indeed, but maybe it's better to extract functions than to compress a lots of conditions in a few lines.

https://github.com/lhsazevedo/php-web-doc/blob/e2efd161435f0778d478ef51eaa7a94fe140146a/scripts/rev.php#L513-L571

Please tell me what you think.

— Reply to this email directly, view it on GitHub https://github.com/php/web-doc/issues/32#issuecomment-1016097524, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSQAZSZLC7235QPVEDCAM3UWZD2TANCNFSM5MIQ3R6A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

lhsazevedo commented 2 years ago

Got it. I'll try to reproduce and fix those git crashes locally. Would you mind sharing scenarios/logs where they happen?

Note that the 80 seconds code already had two 2 git commands per file, a git log on line 540 that handles revcheck skips and a git diff on line 527 for change counts.

It's hard indeed to spot them inside the nested conditions:

foreach( $LANGS as $lang )
{
    if ( )
    {
        ...
    } else {
        if ( ){
            ...
        } elseif ( ) {
            ...
->          $subject = `git diff --numstat $trFile->hash -- {$filename}`;
            ...
        }
        if ( )
            ...
        if ( ) {
            ...
->          $hashes = explode ("\n" , `git log -2 --format=%H -- {$filename}`);
            ...
            if ( )
                ...
        }
        ...
    }
}