Custom validation rules

vkhytskyi-allegro commented 1 year ago

Hello :wave:

To start off I would like to thank you for developing and providing the library to the community!

I have a feature request. As a part of automation process of working with PO files I need to impose validation rules while parsing the files, like ensuring there are no duplicate msgids or no entries with the same value for msgstr. The library already has a parser and a lexer, and I'd like to be able to reuse them for my particular needs. Two approaches come to my mind:

the library exposes abovementioned entities as a part of public API,
the library provides new API that would allow to define custom validation rules.

I'd like to learn you opinion and suggestions. Would you be willing to accept a PR?

smhg commented 1 year ago

Thank you for reaching out! As I'm thinking of possible solutions, I think it would go pretty deep. And is there that much benefit? You can now get any PO/MO object as a JSON structure. You an apply any rules on that and then pass it back to get your output as PO/MO.

Something like:

import { po } from 'gettext-parser';
const data = po.parse(YOUR_INPUT);
// apply validation & changes
const output = po.compile(data);

This works for simple workflows, but maybe not for yours?

vkhytskyi-allegro commented 1 year ago

To provide a little bit more context, I am tasked with validating the contents of PO files against strict custom rules. Those rules might look as follows:

ensuring there are no duplicate msgid in the same context.
ensuring there are no duplicate msgid_plural in the same context.
ensuring the translation contains msgid_plural if there are msgstr[].
ensuring the number of msgstr[] matches the number of nplurals.

Let's focus on duplicates and take a look at a simple example. Say we have:

msgid "foo"
msgstr "foo #1"

msgid "bar"
msgstr "bar"

msgid "foo"
msgstr "foo #2"

After parsing the above contents via po.parse I have no knowledge that there was the entry foo associated with foo #1. By studying the sources, it seems it would be possible to obtain that information if there was a way to inject custom logic after the lexer completes its task, but there is no way to do that because Parser is not publicly accessible.

[...] I think it would go pretty deep. And is there that much benefit?

With regards to benefits, my motivation is that at my working environment there are several hundreds of repositories, each one contains multple PO files, each PO file might have hundreds LoC. Additionally, those repositories are managed by tenth of teams. I need to setup the environment wide strict standard that is a part of a repository build pipeline, be able to identify potential issues or inconsistencies and clearly communicate those to the teams.

I agree that it could go pretty deep, that's why, as a cheap solution, I guess exporting Parser as a part of public API could be a solution that would allow developers to decide on their own whether and how to use or extend it. I'd really love to avoid duplicating the logic (in order get access to its internals) and maintain it elsewhere. I'd also like to avoid forking the library to adjust it to my working environment needs.

smhg commented 1 year ago

Don't get me wrong, your questions are very valid I think. I didn't even realize the first foo in your example gets ignored. There is some room for improvement here.

First of all, specifically for PO-parsing, we'd need a bit of a restructure I think. Away from the current object-oriented style towards a functional approach. With accompanying tests.

We could then expose more parts alongside the current parse/compile methods.

Would you want to take a shot at this? If so, please send changes in small-ish chunks, with tests were feasible. That makes them easier to merge for me as I don't have huge amounts of time.

vkhytskyi-allegro commented 1 year ago

Hi @smhg! That's cool, I am open to take a look into it, although, I think that restructuring and changing the coding style (+ tests coverage) would require some time. Unfortunately, this is out of scope of my current task at hand. Do you see any possible short(er)-term solutions? What do you think about the validation rules I mentioned earlier? In your opinion, could any of them become a part of the library?

vkhytskyi-allegro commented 1 year ago

Hey @smhg, did you have a chance to take a look at my previous comment?

smhg commented 1 year ago

@vkhytskyi-allegro If not already the case, I'd say: fork it and add your requirements. Feel free to do it in a way that is easy to merge in the future. That gives me a better idea of what you need. I definitely think something like this should be part of the library.

vkhytskyi-allegro commented 1 year ago

@smhg Here is a PR. Please let me know what you think.

smhg commented 1 year ago

Thank you! I will try to have a look at it ASAP. It's great others are reviewing it already.

smhg / gettext-parser

Custom validation rules #75