project overlap / parser separation

smhg commented 9 years ago

I've been working on xgettext-template for a while, which seems to be very similar to this project. I haven't looked into things enough to see what's different (besides language support), so I'm avoiding the overlap-subject for now.

I came here with this question: since I was working on Swig-parsing, I wondered if you would consider splitting up the parser(s) into (a) different sub-project(s)? xgettext-template was recently restructured with that in mind: the Handlebars parser is a separate project that can be used independently from the main project (basically the CLI interface). Definitely feel free to use it (it doesn't rely on regex's, but on the Handlebars lib itself).

BYK commented 9 years ago

This was exactly what I was planning for a big major release, I just didn't have time. The current way we do things for different languages are very hacky and it also doesn't make sense to increase the size of the core project to support every new language/template language.

So in short, the answer is a big YES. How can we collaborate? :)

I'd be hesitant to let go of the current JS part because in the past year we have fixed many things and I think it is quite solid. Not the same for other parts tho.

BYK commented 9 years ago

/cc @zaach

smhg commented 9 years ago

Good to hear! :) I really didn't have a decent look at things yet. Allow me to get back to you in a few days.

BYK commented 9 years ago

Awesome, looking forward to this. :astonished:

smhg commented 9 years ago

Ok, I found a bit of time to think about this. Although the CLI's seem very similar, I would indeed suggest to only look at the parsers now.

This is all a proposition (don't read it in any other way):

One assumption (as mentioned earlier): the preferred way to parse is to use the template language's parser/lexer if possible. Like you did for Jade and gettext-handlebars does. If not, then resort to regex (Swig, EJS?).

Would you be able to give gettext-handlebars a try to replace your Handlebars parser? Not because it is "much wow", but to see if it's input-output would work for you. If we can find a good "standard" for this, you could then split up your other parsers using the agreed structure.

Generally, I think a parser needs to accept a keyword spec (see xgettext's --keyword parameter) and a (multiline) template string. Turning a file path into a template string isn't a job for the parser. The output needs to contain msgid's and line numbers. I haven't looked into the [dcnp]gettext variations enough to handle anything beyond n (plural).

One issue I have with this approach is the dependency management in the CLI package. You have any ideas how to handle this? Updating the CLI package every time a new parser is added or a new version is released, feels so troublesome.

beck commented 9 years ago

I just started diving into this project have a bit to add:

use the template language's parser/lexer if possible

:+1:

I think a parser needs to accept a keyword spec ... template string.

There's already an implicit API for how a parser is interfaced, two arguments are sent:

sources - a dictionary of {"file1.js": "content", "file2.js": "content", ... }
options - all config passed in via the command line (including keyword)

I think this is a good starting point for CLI > Parser communication.

As for the response, this is where this project needs some improvement:

Currently the response is:

sources: a dictionary of filename / javascript source
options

I propose that the sources dictionary does not return javascript but a dictionary of translations as spec'd by the gettext-parser.

I think this would yield significant cleanup, including:

Javascript parsing is factored out of jsxgettext and moved into it's own javascript.js under lib/parsers/
The responsibility of jsxgettext.js become more precise: interfacing with parsers and cli.js
Files would no longer be parsed twice (once as their own language, then again as js)
Parsers become simplified as they no longer have to coerce messages into valid js gettext arguments

Updating the CLI package every time a new parser is added or a new version is released, feels so troublesome

True, but it is not too bad.

If all are in agreement about proposed changes, then focus should be on refactoring and developing the internal API. Once that's done, then focus can be on user extensibility with installable parsers.

BYK commented 9 years ago

Updating the CLI package every time a new parser is added or a new version is released, feels so troublesome True, but it is not too bad.

Yup. Moreover we can add some hooks for new parsers to attach themselves (or make them discoverable) dynamically.

If all are in agreement about proposed changes, then focus should be on refactoring and developing the internal API. Once that's done, then focus can be on user extensibility with installable parsers.

Heh, I wrote the thing above before reading this. Yeah I think this sounds like a good path forward.

smhg commented 9 years ago

Good suggestion about using gettext-parser's structure. However, this, combined with sending a dictionary of file contents, makes each parser responsible for result aggregation. Conceptually that sounds more like a job for the "parent" (cli) package, no?

poEdit calls will probably always be for one (template) language so no aggregation should be performed across languages. But you could have calls where the language is derived from the file extension and thus have cases where you need to aggregate the results of multiple parsers. Not common for poEdit, but it might make more sense for something like grunt.

In that light, you could argue a parser just needs to focus on parsing one input string. The result would of course resemble gettext-parser's structure as close as possible.

BYK commented 9 years ago

However, this, combined with sending a dictionary of file contents, makes each parser responsible for result aggregation. Conceptually that sounds more like a job for the "parent" (cli) package, no?

I agree with this one.

poEdit calls will probably always be for one (template) language so no aggregation should be performed across languages.

Sort of disagree. I think we should always do aggregation but should not expect it from individual parsers. May be unless it is in the same file? Not sure. I want to make parsers as simple as possible and gettext-parser's structure has a bit more than that. On the flip side, if we do not enforce this structure we would end up inventing our own structure which would be quite similar to this one I think.

beck commented 9 years ago

a parser just needs to focus on parsing one input string

:+1:

smhg commented 9 years ago

Great. Let's agree on parsers' use and its output format then. To get a discussion started, gettext-handlebars is used like this:

var parser = new Parser({
  _: [0],
  gettext: [0],
  ngettext: [0, 1]
});

parser.parse(contentOfOneFile);

keywordspec is passed to constructor (can there be other options?)
exposes a parse method that takes a string

Please adapt the above as you see fit and feel free to add a sample return value format.

beck commented 9 years ago

I would prefer that parsers be instantiated with convention > configuration.

For example, if the handlebars are already using gettext and ngettext then the parser should not need any additional config. (side note: I've stopped using _ as good convention because it too often collides with those using lodash, underscore, or as a placeholder for throwaways).

Question: why is the argument array necessary? Are there cases where a keyword func would get anything other than one argument (simple translation) or three (plural)?

If one needs to override/extend the agreed-upon convention, (say the handlebars use trans or _ instead of gettext) then this should be provided with the keyword option.

I think the parser should be provided all options (so when something like debug is added to the cli, that information is automatically provided to all parsers). I would recommend:

var parser = new Parser(options);
var translations = parser.parse(contentOfOneFile);

smhg commented 9 years ago

I'm definitely in favour of providing options. Even if there isn't anything else besides a keywordspec which is ever passed. Let's agree on the full option names of xgettext as property names for the options object? With dashes turned into camelcase? So the keywordspec would indeed be under keyword.

I'm not entirely sure about the defaults though. It is definitely true xgettext has defaults. I think these of course should be honoured. I mean: at least in the cli package. What I'm not sure about is whether every parser needs to have defaults. Parsers can be quite small so this would add a relatively high amount of duplicate code. On the other hand: for stand alone usage of parsers default keywords are nice. Since this is a rather minor thing: maybe we go for defaults in every parser and reevaluate this later?

About the argument position array for every keyword: parsers need to accept these. Yes, people (including myself) use others than the defaults. In general: I would always try to offer full xgettext compatibility. We could however additionally handle a flat array of keywords too if you prefer. It's related to the above though: do we want ever parser to handle this on its own or not?

ArmorDarks commented 9 years ago

Not much I can help here, but just for the sake of at least some support — I think it will be great feature. Hope to see it in nearby future.

zaach / jsxgettext

project overlap / parser separation #82