php-gettext / Gettext

PHP library to collect and manipulate gettext (.po, .mo, .php, .json, etc)
MIT License
687 stars 132 forks source link

#282 Add strict po loader #283

Closed jonasraoni closed 1 year ago

jonasraoni commented 2 years ago

Hi @oscarotero!

I did some initial updates to cover my needs.

jonasraoni commented 2 years ago

I've tested some weird combinations, but forgot to move them to the unit test 🤦 Well, I'll wait for some comments (e.g. if it's ok to keep a new class or if you prefer to replace the existing one, etc), then I might add some tests.

oscarotero commented 2 years ago

Hi. Thanks for this work!! I'm not familiarized with the strict mode. What's the difference between strict mode and regular po? This will help me to understand the need to create a new different Po loader.

jonasraoni commented 2 years ago

Hi!

There's no "strict" syntax, I just mimicked the behavior of the GNU tools, which I think is a good guide (given the documentation is missing details) :)

For example, the msgfmt works fine with this syntax (the new lines are required just to break comments):

msgctxt"context"msgid"message"msgstr"translation"

But it fails if you try to parse things in the wrong order, with an invalid msgstr index (out of order/out of bounds), bad escaping, etc.

So in theory this could replace the current PoLoader class, but it can introduce surprises (exceptions) for your users if their files have a bad syntax (I've tested this code against 1274 files of my project and all the exceptions I had were genuine).

So I'll wait for your feedback. In order to decrease the pain, I could just add a flag, to swallow the exceptions by default, and continue to the next translation... But the input vs output of a defective file won't be the same as the current class.

jonasraoni commented 1 year ago

Ok @oscarotero, I'm done with the performance fixes (more than that will require me to uglify the code even more or drop features). In a small dataset with 22 files (564KB) the time went from 4s to .47s, while the PoLoader is taking .25s.

I thought about doing a memory optimization (parse the file by chunks), but that will damage the speed (and my use case doesn't need it: locale files with a max of 170KB) 🤔

If you think it's helpful, I could rebase the commits to mark the tests/BasePoLoaderTestCase.php as a file rename from tests/PoLoaderTest.php, just to improve the code history.

oscarotero commented 1 year ago

Thanks @jonasraoni

If you think it's helpful, I could rebase the commits to mark the tests/BasePoLoaderTestCase.php as a file rename from tests/PoLoaderTest.php, just to improve the code history.

Yeah, it makes sense. But I think it's not so important, so whatever you want :)

One think that I would like is update the README.md file to include this new loader and explain the differences between both PO loaders (including also the difference in performance). As this info is more in your head, could you please add this latest change before merging?

Thanks so much!

jonasraoni commented 1 year ago

@oscarotero I've updated the README and the CHANGELOG 👌

oscarotero commented 1 year ago

Thank you! 👍