wikimedia / banana-i18n

banana-i18n - Javascript Internationalization library
https://wikimedia.github.io/banana-i18n/
MIT License
79 stars 27 forks source link

Disable conversion of [[...]] into <a> tags #37

Closed nyurik closed 3 years ago

nyurik commented 4 years ago

I use Banana library to create edit summary messages for a wiki. Those edit summaries can contain [[...]] links in wiki markup. Banana keeps converting them into HTML <a> tags. Currently I'm doing the reverse translation of back into [[...]] using regex replacement -- obviously not a very clean solution. How can I disable it in Banana?

santhoshtr commented 4 years ago

Is it possible to use html entities in the message. For example [[...]] becomes &#91;&#91;...&#93;&#93;. Is that an acceptable solution? Mediawiki has wfEscapeWikitext that does this conversion that abstracts this.

nyurik commented 4 years ago

@santhoshtr I don't think that's a good solution for the edit summary messages -- is it even possible to use html entries in the edit summary? Plus my messages are very long, and that's 19 extra characters lost :). I would rather have a no html magic mode if possible.

nyurik commented 4 years ago

Plus another reason not to use HTML entities -- it will confuse translators, making those messages less likely to be translated. For my app, the edit summaries are by far the most important, because even if the app is in English, it is only seen by a few, where as the edit summaries will be seen by the entire community.

santhoshtr commented 4 years ago

You are right. In your usecase, entities are not nice. There are two scenarios:

  1. A usecase of just localizing a message, but not handling placeholders, links and such. Example: 'This apple costs $1'. $1 here should not be considered as first placeholder.
  2. A usecase where placeholders and links need to be handled, but certain placeholders should not. Example: 'This [[apple]] costs $1'. Here apple should be converted to a link, but $1 should be kept as such.

I prepared a codepen demonstrating how to avoid parsing links and placeholders at https://codepen.io/santhoshtr/pen/OJNOjaN Basically, instead of .i18n method, .getMessage gives you message localized(including fallback resolution), and it will not handle placeholders and links etc.

nyurik commented 4 years ago

@santhoshtr I think you got it in reverse. I do want the placeholders to be resolved (otherwise why would i even bother with the localization library?). I do not want it to handle wiki markup.

Thanks for writing the example, I modified it to show what I mean:

window.addEventListener("load", () => {
  const banana = new Banana("es", {
    messages: {
      en: { message_2: "This [[apple]] costs $1" }
    },
    finalFallback: "en"
  });

  document.getElementById("result_1").innerHTML= banana.i18n("message_2", 10);
  document.getElementById("result_2").innerHTML= banana.getMessage("message_2", 10);
});
santhoshtr commented 4 years ago

Ok, so we have a mixed case. Message should be parsed, but some of the syntaxes shouldnot be parsed.(parse placeholder, but not links). I updated the example showing how to do that.

nyurik commented 4 years ago

@santhoshtr thanks, I was not aware of the simple parse, but it does not solve the problem. I do want all the i18n-related features. I do not want any wiki markup features:

message_2: "This [[apple]] costs {{PLURAL:$1|one dollar|$1 dollars}}"

results in

This [[apple]] costs {{PLURAL:10|one dollar|10 dollars}}

Wiki markup parsing in my opinion are outside of the i18n scope, and should not even be in this library (unless there is some other usecase I do not know about).

nyurik commented 4 years ago

P.S. In the above I mean that [[...]] is an example of wiki markup, whereas {{PLURAL}}, {{GRAMMAR}}, and $1 are parts of the i18n. The square brackets should be handled by the regular wiki markup parser, which would also handle bold/italic/template functions/... . The PLURAL and GRAMMAR are not part of the regular wiki markup parsing, and instead are handled by the localization lib. Banana is clearly aiming for the second, so i was very surprised it even looked at the square brackets (but didn't look at '''bold''').

Also note that Banana is not even handling the square brackets correctly. Text like [[banana]]s in Wikipedia would include the trailing s as part of the link.

Was there a reason to include square bracket parsing in this lib?

santhoshtr commented 4 years ago

This library is jquery indepedent implementation of jquery.i18n which was an implementation of jqueryMsg module in mediawiki as mediawiki independent i18n library. So the spec for the messages to support originates from mediawiki. The usecases and applications that use this library is also more or less assumed those features. But I can totally understand the need to remove mediawiki specific parsing out of this library.

I am thinking of adding an option to the constructor. something like wikilinks: true|false to avoid wiki links parsing. And the default being false with a major version bump to avoid breakages. What do you think?

cc @stephanebisson @Nikerabbit @amire80

nyurik commented 4 years ago

Sounds great, thanks!

stephanebisson commented 4 years ago

LGTM

Note that a new feature that is disabled by default can be a minor version bump since it shouldn't break anything.

santhoshtr commented 4 years ago

It will atleast break a couple of projects I use this and the section translation project of WMF language team :-). All of them assume wikilinks support