Closed vkhytskyi-allegro closed 1 year ago
Thank you for reaching out! As I'm thinking of possible solutions, I think it would go pretty deep. And is there that much benefit? You can now get any PO/MO object as a JSON structure. You an apply any rules on that and then pass it back to get your output as PO/MO.
Something like:
import { po } from 'gettext-parser';
const data = po.parse(YOUR_INPUT);
// apply validation & changes
const output = po.compile(data);
This works for simple workflows, but maybe not for yours?
To provide a little bit more context, I am tasked with validating the contents of PO files against strict custom rules. Those rules might look as follows:
msgid_plural
if there are msgstr[]
.msgstr[]
matches the number of nplurals
.Let's focus on duplicates and take a look at a simple example. Say we have:
msgid "foo"
msgstr "foo #1"
msgid "bar"
msgstr "bar"
msgid "foo"
msgstr "foo #2"
After parsing the above contents via po.parse
I have no knowledge that there was the entry foo
associated with foo #1
. By studying the sources, it seems it would be possible to obtain that information if there was a way to inject custom logic after the lexer completes its task, but there is no way to do that because Parser
is not publicly accessible.
[...] I think it would go pretty deep. And is there that much benefit?
With regards to benefits, my motivation is that at my working environment there are several hundreds of repositories, each one contains multple PO files, each PO file might have hundreds LoC. Additionally, those repositories are managed by tenth of teams. I need to setup the environment wide strict standard that is a part of a repository build pipeline, be able to identify potential issues or inconsistencies and clearly communicate those to the teams.
I agree that it could go pretty deep, that's why, as a cheap solution, I guess exporting Parser
as a part of public API could be a solution that would allow developers to decide on their own whether and how to use or extend it. I'd really love to avoid duplicating the logic (in order get access to its internals) and maintain it elsewhere. I'd also like to avoid forking the library to adjust it to my working environment needs.
Don't get me wrong, your questions are very valid I think.
I didn't even realize the first foo
in your example gets ignored. There is some room for improvement here.
First of all, specifically for PO-parsing, we'd need a bit of a restructure I think. Away from the current object-oriented style towards a functional approach. With accompanying tests.
We could then expose more parts alongside the current parse/compile methods.
Would you want to take a shot at this? If so, please send changes in small-ish chunks, with tests were feasible. That makes them easier to merge for me as I don't have huge amounts of time.
Hi @smhg! That's cool, I am open to take a look into it, although, I think that restructuring and changing the coding style (+ tests coverage) would require some time. Unfortunately, this is out of scope of my current task at hand. Do you see any possible short(er)-term solutions? What do you think about the validation rules I mentioned earlier? In your opinion, could any of them become a part of the library?
Hey @smhg, did you have a chance to take a look at my previous comment?
@vkhytskyi-allegro If not already the case, I'd say: fork it and add your requirements. Feel free to do it in a way that is easy to merge in the future. That gives me a better idea of what you need. I definitely think something like this should be part of the library.
@smhg Here is a PR. Please let me know what you think.
Thank you! I will try to have a look at it ASAP. It's great others are reviewing it already.
Hello :wave:
To start off I would like to thank you for developing and providing the library to the community!
I have a feature request. As a part of automation process of working with PO files I need to impose validation rules while parsing the files, like ensuring there are no duplicate
msgid
s or no entries with the same value formsgstr
. The library already has a parser and a lexer, and I'd like to be able to reuse them for my particular needs. Two approaches come to my mind:I'd like to learn you opinion and suggestions. Would you be willing to accept a PR?