phetsims / rosetta

PhET's Simulation Translation Utility
MIT License
3 stars 1 forks source link

multi-line strings containing '\n' are getting an additional '\' inserted #207

Closed jbphet closed 2 years ago

jbphet commented 5 years ago

This was initially reported in https://github.com/phetsims/isotopes-and-atomic-mass/issues/97, please read through that for necessary background info.

The essence of the problem is that multi-line strings that contain \n are ending up with \\n in the actual string file in babel, so the strings end up looking bad in the sims.

jbphet commented 5 years ago

I just looked through RPAL's translated string files, and I'm not seeing any occurrences of \\n. This might mean that the problem was introduced into after these translations were submitted. I tested the Polish translation in rosetta using the 'Test' button, and saw this for the multi-line string (which translates to "No\nReaction"):

image

The current live version looks like this:

image

So the problem has not always been present. The Polish translation was submitted on Saturday, February 13, 2016, so the issue was introduced after that time. I looked through the other non-mangled RPAL translations, and the _ga version was submitted on Monday, November 27, 2017, so the problem was introduced after that.

jbphet commented 5 years ago

Issue #144 seems related to this one too.

jbphet commented 5 years ago

It looks like an object with a string that contains the carriage return character (ASCII integer 10) that is turned into JSON using JSON.parse will have that value turned into an explicit "\n" string (ASCII integer characters 92 and 110). However, an object that contains a string with the actual "\n" character will end up having it converted into "\n".

I ran a test where I created a simple form, entered data that included a literal "\n" string, and pulled out the value using JavaScript (see the code below). When the data was extracted, put into an object, then run through JSON.stringify, it ended up with the "\n" string. This is almost certainly what is going on in rosetta.

I'm not sure what changed in the code that caused this to start happening, but I'm not sure that it matters - I just need to fix it. I think I probably need to add code to explicitly replace "\n" with a line feed character.

Here is the HTML file that, I think, demonstrates the essence of the problem. To show the problem, enter a value that includes one or more "\n" sequences for the first name.

<!DOCTYPE html>
<!-- Copyright 2019, University of Colorado Boulder -->

<html>
<body>

<h2>HTML Forms</h2>

<form action="/http://localhost:8080/stub.php" method="post">
  First name:<br>
  <input type="text" id="first-name" value="Mickey">
  <br>
  Last name:<br>
  <input type="text" id="last-name" value="Mouse">
  <br><br>
  <input type="submit" value="Submit">
</form>

<p>If you click the "Submit" button, the form-data will be sent to a page called "/action_page.php".</p>

<button type="button" onclick="testFormContents()">Run Test Code</button>

<script>
  function testFormContents(){
    console.log( 'hello' );
    const nameValue = document.getElementById( "first-name" ).value;
    console.log( 'nameValue = ' + nameValue );
    const stringsObject = {
      value: nameValue
    };
    console.log( 'JSON.stringify( stringsObject ) = ' + JSON.stringify( stringsObject ) );
  }
</script>

</body>
</html>
jbphet commented 5 years ago

I just did some experimenting with the code just above and found that the following code works to do the necessary replacement operation:

nameValue.replace( /\\n/g, '\n' )
jbphet commented 4 years ago

I just searched through babel using the regular expression "value":.*\\\\n to find occurrences of the bad version the line feed sequence. I found 31 of them. Yikes. I decided to look at a couple of them individually to see how they manifest in a sim. One example is the Slovenian version of States of Matter. The string for "triplePoint" is "value": "trojna\\ntočka", and contains the erroneous \\n. Here's what the string looks like in the context of the translated sim (I circled the bad string in red):

image

Here's the Czech version of Molarity:

image

I think what I will do is fix the ones that I can manually and change the code in Rosetta to stop this from happening in the future.

jbphet commented 4 years ago

I have manually committed fixes to a number of files that contained the bad line feed sequence. I did this manually rather than through the translation interface because if I do the latter, I'll end up credited as one of the translators, which we don't want. I skipped the RTL languages because they don't handle search-and-replace well in my IDE.

I have not forced any of these to be published. I'll talk with @ariel-phet to figure out if we want to do that.

jbphet commented 4 years ago

@ariel-phet - the problem in Rosetta that was causing the broken line feeds has been fixed, and I have done a maintenance release for Isotopes and Atomic Mass to force republication of the fixed translations since there was a specific request for that one (see https://github.com/phetsims/isotopes-and-atomic-mass/issues/97). Shall I take the time to force a maintenance release on the other sims that exhibit this problem in their translations, or just leave them be until the next maintenance release causes them to be republished? I would estimate about 2 hours to do them all. I just asked @jonathanolson about upcoming maintenance releases, and he says there are no wide-reaching ones on the immediate horizon that would cause these sims to get republished.

ariel-phet commented 4 years ago

@jbphet is it possible to only publish a fix in the particular translations? How many sims does this affect? I see above 31 instances (is that 31 sims, probably not, but I don't know how many). If we are able to fix just a particular translation, that would be ideal since then an update would only be triggered for an app using that particular locale.

Could we alternatively notify the translators for relevant sims and just have them fix it in rosetta?

Additionally, maybe we need some process built into Rosetta (like an admin mode) that allows us to fix things in a particular translation and not be credited as a translator? That seems like it should be fairly simple - some state that if signed in as "phetadmin" or something, translation credit is not given? It seems that things have come up enough where simply fixing things in the Rosetta interface would be the simplest option if possible.

jbphet commented 4 years ago

In response to @ariel-phet's questions,

[I]s it possible to only publish a fix in the particular translations?

Yes, at least I think so. Rosetta tells the build server to rebuild a given translation each time new strings are submitted for that locale, so I could dig into it, figure out the URL to poke the build server, and do these sims. I'd estimate 2-3 hrs for me to do this, start to finish.

How many sims does this affect? I see above 31 instances (is that 31 sims, probably not, but I don't know how many).

31 is the total of sims plus locales. Here is a list of the sims only:

Could we alternatively notify the translators for relevant sims and just have them fix it in rosetta?

I'm actually not sure that would work, since I've made the manual changes to the translations, so Rosetta would probably not detect the change and wouldn't allow a submit. They'd have to change something else too. So no, this is probably not the best option.

Additionally, maybe we need some process built into Rosetta (like an admin mode) that allows us to fix things in a particular translation and not be credited as a translator?

Agreed. There is an issue logged for this, see https://github.com/phetsims/rosetta/issues/161.

ariel-phet commented 4 years ago

@jbphet thanks...lets go ahead with that option you mentioned above of 2-3 hours of manual "poking" work. That is likely to save lots of unnecessary updating for the app and locales not suffering from this issue.

jbphet commented 4 years ago

I modified SOM and IAAM to use RichText instead of MultiLineText recently, and at that time modified the strings with line breaks to use <br> instead of '\n'. I've also fixed a number of the problematic strings.

For reference, here are a few examples of sims where the problem was still occurring as of this morning. The first is the Basque version of Least Squares Regression. Note the \n in the middle of the text (I didn't circle it):

image

Here is the scoreboard from the Arabic translation of Build an Atom. The bad part is circled:

image

Here is what it looks like after manually fixing the errant string:

image

jbphet commented 4 years ago

I searched through babel using the regular expression described at the top and found only two remaining occurrences of current translated strings that used the errant \\\n pattern. I fixed both, and also fixed a spot where it looked like a translator wanted a line break but got frustrated trying to get it to work and gave up. I then republished all of these using the recently added "trigger-build" feature of Rosetta.

I think this is pretty much done. There is one sim that I know of - States of Matter Basics - that still exhibits the problem in some of its live translations, but that will be corrected once we release a version that is build off of the states-of-matter 1.2 branch (see https://github.com/phetsims/states-of-matter-basics/issues/31).

@ariel-phet - I'm assigning this to you so that you can be aware that it was done, and it was done without having to republish the whole sim in each case (just the fixed translations were republished), so we avoided triggering updates to the iOS and Android apps (I think). Once noted, this can be closed.

oliver-phet commented 4 years ago

@jbphet reopening this issue since a translator just reported a similar issue, but appears to be inserting "/n" correctly?

I'm translating the Balloon and static electricity simulation into Icelandic but having problems with adding a newline to strings. The instructions say that one should be able to use the "\n" if there is one in the original string: image but this isn't working in my case: image which displays as: image image Can you tell me how to add a newline character in this case?

liammulh commented 3 years ago

We are 80% sure @jbphet didn't make the changes for the test route to correctly handle \n. Check this and fix.

liammulh commented 3 years ago

I'm going to pull this fix onto phet-server-dev and test it out. If all goes well I will pull it onto phet-server.

liammulh commented 3 years ago

Looks good on phet-server-dev. I'll wait for JB's go-ahead to pull onto phet-server.

liammulh commented 2 years ago

Since those commits were pushed to the master branch, I think they've been running on phet-server for some time. I don't think we've heard any complaints related to additional \s, so I'm going to close. We can re-open if we hear otherwise.