php-gettext / Gettext

PHP library to collect and manipulate gettext (.po, .mo, .php, .json, etc)
MIT License
690 stars 135 forks source link

Further improve comment extraction #166

Closed swissspidy closed 6 years ago

swissspidy commented 6 years ago

Follow-up to #164:

oscarotero commented 6 years ago

Hi. I think is better to have a pull request per feature 😃 Anyway, I want to keep as close as possible to the gettext behaviour, in order to have high compatibility. Case insensitive prefixes for comments breaks this compatibility, so I prefer do not include this.

Not sure whether comment in the same line is valid in xgettext. The gettext docs says "preceding keyword lines". 🤔

And about multiline comments, I think this logic should be placed inside the ParsedComment class.

So, instead:

$lineNumber = $value[2] + substr_count($value[1], "\n");

You could do simply:

$lineNumber = $comment->getLastLine();

Or even better:

if ($comment->isRelatedWith($bufferFunctions[0])) {
    $bufferFunctions[0]->addComment($comment);
}

This leaves the parser cleaner and move the responsability of how to handle comments to the ParsedComment class.

swissspidy commented 6 years ago

Anyway, I want to keep as close as possible to the gettext behaviour, in order to have high compatibility. Case insensitive prefixes for comments breaks this compatibility, so I prefer do not include this.

Fair enough. An alternative would be to pass the various case variations to the functions scanner, so that's not a big deal. I'll undo it.

Not sure whether comment in the same line is valid in xgettext. The gettext docs says "preceding keyword lines".

It is.

Example code:

<?php

/* allowed1: boo */ /* allowed2: only this should get extracted. */ /* some other comment */ $bar = strtolower( __( 'Foo' ) );

Result of xgettext --add-comments=allowed2 --keyword=__:1 --add-location example.php:

#. allowed2: only this should get extracted.
#. some other comment
#: example.php:2
msgid "Foo"
msgstr ""

Whereas with this PR the library does this:

#. allowed2: only this should get extracted.
#: ./example.php:2
msgid "Foo"
msgstr ""

So I'd even say this library does it better than xgettext because some other comment is not a valid translator comment in that case :-)

And about multiline comments, I think this logic should be placed inside the ParsedComment class.

I'll try to come up with something :-)

swissspidy commented 6 years ago

It would probably be a bit less complex if parsePhpComment() would return a ParsedComment 🤔🤷‍♀️

oscarotero commented 6 years ago

It would probably be a bit less complex if parsePhpComment() would return a ParsedComment

Yes, good idea.

And about the last line of the comment, I meant to let the ParsedComment instance to calculate the latest line, instead the parsePhpComment(), because the only things needed is the comment string and the first line number, that were passed as constructor arguments.

swissspidy commented 6 years ago

because the only things needed is the comment string and the first line number, that were passed as constructor arguments.

After parsing, the comment is on a single line though, so the ending line number can't be calculated from that.

oscarotero commented 6 years ago

Ok, so what do you think about moving all this logic to a static factory of ParsedComment?. For example:

protected function parsePhpComment($value)
{
    if ($this->extractComments === false) {
        return;
    }

    //this returns a comment or null
    $comment = ParsedComment::create($value);

    if ($comment && $comment->checkPrefix($this->extractComments)) {
        return $comment;
    }
}
swissspidy commented 6 years ago

Agreed, makes sense. It looks much clearer with that.

oscarotero commented 6 years ago

Thanks!