phetsims / build-a-molecule

"Build a Molecule" is an educational simulation in HTML5, by PhET Interactive Simulations.
GNU General Public License v3.0
8 stars 7 forks source link

RTL languages can have backwards formulas #220

Open liammulh opened 3 years ago

liammulh commented 3 years ago

Got an email into phethelp:

I translated the "Build a Molecule" simulation into Persian but I just saved and didn't submit it because I saw it will have some problems with chemical formulas. Enclosed you can find the problem. I think it is due to that Persian is a right to left language like Arabic. Recently, I saw that someone has submitted the translation with that error. Please check this out.

Attached screenshot: rtl-issue

Assigning @ariel-phet for prioritization.

jbphet commented 3 years ago

@arouinfar should also consult on this, since she speaks Persian and has done some translations of PhET sims into this language.

arouinfar commented 3 years ago

Those chemical formulas are absolutely wrong. CO2 should always be CO2, etc. Chemical formulas and chemical symbols are a universal standard, and should never be localized.

ariel-phet commented 3 years ago

@muedli given the popularity of this sim, it would be great to fix this issue. Can you determine if this is a general rosetta problem or if this is sim specific?

liammulh commented 3 years ago

This is a sim-specific problem. Here's a screenshot from Balancing Chemical Equations in Farsi: sim-specific

I wasn't sure about this when I made this issue, but I'm now fairly certain that Rosetta doesn't have control over the content of formulas. Rosetta can be used to change the order in which the formula is seen, but it can't change the formula itself. formulas

The dev responsible for BAM needs to make sure chemical formulas can't be internationalized.

liammulh commented 3 years ago

Moving this issue to the BAM repo.

liammulh commented 3 years ago

@ariel-phet had this on high priority in the Rosetta repo, so I'm going to put it on high here.

jonathanolson commented 3 years ago

This... looks fixed on master? Running with ?locale=fa locally:

image

@arouinfar thoughts on what to do?

arouinfar commented 3 years ago

@jonathanolson the subscripts and ordering of elements in a chemical formula look ok in master. However, there are still some serious issues. Coefficients are not being properly handled, and a string like Goal: 4H2 is being rearranged oddly into 4: goal H2. The primary learning goal of the Multiple screen if for students to figure out the meaning of coefficients and how they differ from subscripts. The current localization makes that impossible. The location of the parentheses is also really messed up, but I think that's a recurrent (and annoying) issue with RTL localization.

en fa
image image
jonathanolson commented 3 years ago

Both of those issues (the separation and the parenthesis) should both be very solvable issues, I'll look into it!

jonathanolson commented 3 years ago

It looks like a combination of two ways embedding marks were missing:

  1. The normal RTL embedding marks were outright missing around the RTL strings. @jbphet I was under the impression that rosetta added RTL marks by default around RTL language strings. Is this incorrect now? I'd like to see how this was possible to end up in our system. @arouinfar if there was anything atypical with the creation of the strings for https://github.com/phetsims/babel/commit/8805b1cc7d0ef64915b088b5471faabe79017851, it would help to know
  2. Interior embedding marks to switch the molecular content to RTL were missing. This could be handled in the simulation (by adding RTL marks around the embedded content), and that SHOULD in general be how we do things. It would warrant more discussion about how we include string information.

So basically:

After fixes:

image

jonathanolson commented 3 years ago

For reference, a single string change was:

Before: image After: image

202a: Start LTR mark 202b: Start RTL mark 202c: End (one of the above) mark

I'm lazily using a tool I made at https://jonathanolson.net/shaping/examples/bidi-test.html (where I can copy-paste in any string to see the embedding marks and logical structure easily)

jbphet commented 3 years ago

@jbphet I was under the impression that rosetta added RTL marks by default around RTL language strings. Is this incorrect now?

It is not incorrect that I know of. Rosetta is definitely designed to add the RTL embedding marks. There has been some work on the related file recently, done by @muedli, so I'll ask him to investigate if any of the changes he made may have broken this functionality.

@jbphet can these be moved to master in babel? Should they have "edit" JSON structures associated with them?

You should be able to safely move them to babel, and you do NOT want to have any sort of an edit trail with your name on it, otherwise you'll end up being credited as a translator on the website. I would suggest just making the change to the existing file with no edit log entry. Note that this will not trigger a rebuild of the translated version. If you'd like to try manually triggering a build, there are instructions in https://github.com/phetsims/rosetta/blob/master/doc/admin-guide.md, or you can just ask me or @muedli to do it once you've committed your changes to master.

Maybe sometime we (@arouinfar and @jbphet) could discuss the current strategy for handling RTL marks.

I'd be up for it. I haven't done anything on RTL translation for quite some time, and we haven't had any other reports of problems, so I'd like to better understand why it turned out to be an issue for this particular sim.

I'm assigning this back to @jonathanolson to see my responses continue the dialog if needed, and to @muedli to investigate whether some of the recent changes to translate-sim.js may have inadvertently affected the addition of RTL embedding marks.

arouinfar commented 3 years ago

Looks awesome @jonathanolson!

If there's something that needs more discussion, I'd be happy to join in.

jonathanolson commented 3 years ago

Pushed to master in babel, can we trigger a translation build @muedli?

Additionally, I think it's most likely necessary to wrap embedded LTR values (like the molecular formulas) with embedding marks as a general development practice. It might be good to discuss this briefly sometime @jbphet to get your opinion.

liammulh commented 3 years ago

I need to push a small fix to master before I can use localhost to trigger a build.

Note to Self: https://github.com/phetsims/rosetta/commit/eadd764b7196f7c5925deed68393b06e52101e9a is the commit you're looking for.

liammulh commented 2 years ago

I've been going through old GitHub issues and I realized this one fell off my radar. I triggered two different build requests for BAM in the fa locale around 10 AM. (Shouldn't the new builds be up by now?) The build server logs seem to indicate the builds succeeded. I'm still seeing:

Screenshot from 2022-06-10 10-58-32

when I go to https://phet.colorado.edu/sims/html/build-a-molecule/latest/build-a-molecule_fa.html.

I switched browsers and used a private window, so I'm fairly certain it's not a caching issue.

liammulh commented 1 year ago

@jbphet and I were working on Rosetta 2.0 today and we discovered that Rosetta 1.0 only adds embedding marks when a translation is tested.

Rosetta 1.0 does not add embedding marks when a translation is submitted, to the best of my and @jbphet's knowledge. Perhaps that explains what I was seeing in https://github.com/phetsims/build-a-molecule/issues/220#issuecomment-1152653660?

liammulh commented 1 year ago

@jbphet is going to look into this when he gets a chance.

jbphet commented 1 year ago

I've done some investigation on this and, while I don't have a solution yet, I have some information that should help us work towards one. Here are some notes:

For reference, here is the debug code that I added to the BAM class SingleCollectionNodeBox:

``` Index: js/common/view/SingleCollectionBoxNode.js IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/common/view/SingleCollectionBoxNode.js b/js/common/view/SingleCollectionBoxNode.js --- a/js/common/view/SingleCollectionBoxNode.js (revision ebad6c0d6a2cb7ff329f5114c76e57cd1fcf5f61) +++ b/js/common/view/SingleCollectionBoxNode.js (date 1667341979541) @@ -25,6 +25,27 @@ constructor( box, toModelBounds, showDialogCallback ) { super( box, toModelBounds, showDialogCallback ); assert && assert( box.capacity === 1 ); + if ( box.moleculeType.molecularFormula === 'H2O' ) { + console.log( '------------------' ); + console.log( `box.moleculeType.commonName = ${box.moleculeType.commonName}` ); + console.log( `box.moleculeType.molecularFormula = ${box.moleculeType.molecularFormula}` ); + const titleString = StringUtils.fillIn( BuildAMoleculeStrings.collectionSinglePattern, { + general: box.moleculeType.getGeneralFormulaFragment(), + display: box.moleculeType.getDisplayName() + } ); + console.log( `titleString = ${titleString}` ); + const convertToHex = str => { + let hex = ''; + for ( let i = 0; i < str.length; i++ ) { + hex += '' + str.charCodeAt( i ).toString( 16 ) + ' '; + } + return hex; + } + console.log( `convertToHex( box.moleculeType.getGeneralFormulaFragment() ) = ${convertToHex( box.moleculeType.getGeneralFormulaFragment() )}` ); + console.log( `convertToHex( BuildAMoleculeStrings.collectionSinglePattern ) = ${convertToHex( BuildAMoleculeStrings.collectionSinglePattern )}` ); + console.log( `convertToHex( box.moleculeType.getDisplayName() ) = ${convertToHex( box.moleculeType.getDisplayName() )}` ); + console.log( `convertToHex( titleString ) = ${convertToHex( titleString )}` ); + } this.insertChild( 0, new RichText( StringUtils.fillIn( BuildAMoleculeStrings.collectionSinglePattern, { general: box.moleculeType.getGeneralFormulaFragment(), display: box.moleculeType.getDisplayName() ```

And here is the output of the debug code when run on the fa translation, which exhibits the problem:

------------------
SingleCollectionBoxNode.js? [sm]:30 box.moleculeType.commonName = water
SingleCollectionBoxNode.js? [sm]:31 box.moleculeType.molecularFormula = H2O
SingleCollectionBoxNode.js? [sm]:36 titleString = ‫H<sub>2</sub>O (‫آب‬)‬
SingleCollectionBoxNode.js? [sm]:44 convertToHex( box.moleculeType.getGeneralFormulaFragment() ) = 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f 
SingleCollectionBoxNode.js? [sm]:45 convertToHex( BuildAMoleculeStrings.collectionSinglePattern ) = 202b 7b 7b 67 65 6e 65 72 61 6c 7d 7d a0 28 7b 7b 64 69 73 70 6c 61 79 7d 7d 29 202c 
SingleCollectionBoxNode.js? [sm]:46 convertToHex( box.moleculeType.getDisplayName() ) = 202b 622 628 202c 
SingleCollectionBoxNode.js? [sm]:47 convertToHex( titleString ) = 202b 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f a0 28 202b 622 628 202c 29 202c 

Here is the output from the iw version, which does not have the problem.jj

------------------
SingleCollectionBoxNode.js? [sm]:30 box.moleculeType.commonName = water
SingleCollectionBoxNode.js? [sm]:31 box.moleculeType.molecularFormula = H2O
SingleCollectionBoxNode.js? [sm]:36 titleString = ‪H<sub>2</sub>O (‫מים‬)‬
SingleCollectionBoxNode.js? [sm]:44 convertToHex( box.moleculeType.getGeneralFormulaFragment() ) = 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f 
SingleCollectionBoxNode.js? [sm]:45 convertToHex( BuildAMoleculeStrings.collectionSinglePattern ) = 202a 7b 7b 67 65 6e 65 72 61 6c 7d 7d 20 28 7b 7b 64 69 73 70 6c 61 79 7d 7d 29 202c 
SingleCollectionBoxNode.js? [sm]:46 convertToHex( box.moleculeType.getDisplayName() ) = 202b 5de 5d9 5dd 202c 
SingleCollectionBoxNode.js? [sm]:47 convertToHex( titleString ) = 202a 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f 20 28 202b 5de 5d9 5dd 202c 29 202c 

As noted above, it looks like the difference is in the RTL embedding marks for the pattern.

jbphet commented 1 year ago

@jonathanolson - Are RTL embedding marks being added in the string generation stage? If so, do you think that not adding them to patterns might fix this issue? It seems to me that it is creating a bit of a "double negative" sort of situation.

Once you've reviewed the notes above and commented, let me know if you'd like to meet to discuss. I'm happy to do so. This is marked as high priority because we'd really like to get it worked out before Rosetta 2.0 is deployed, which is imminent.

arouinfar commented 1 week ago

@JacquiHayes reported that this problem exists in the Dari fa_DA locale, but not the similar Pashto ps locale over in https://github.com/phetsims/joist/issues/973.

jonathanolson commented 1 week ago

Looks like we have a lot of issues related to this. Patched in https://github.com/phetsims/chipper/issues/1355.

I believe that is resolving it the "correct" way (putting explicit LTR marks around embedded LTR content).