Open liammulh opened 3 years ago
@arouinfar should also consult on this, since she speaks Persian and has done some translations of PhET sims into this language.
Those chemical formulas are absolutely wrong. CO2
should always be CO2
, etc. Chemical formulas and chemical symbols are a universal standard, and should never be localized.
@muedli given the popularity of this sim, it would be great to fix this issue. Can you determine if this is a general rosetta problem or if this is sim specific?
This is a sim-specific problem. Here's a screenshot from Balancing Chemical Equations in Farsi:
I wasn't sure about this when I made this issue, but I'm now fairly certain that Rosetta doesn't have control over the content of formulas. Rosetta can be used to change the order in which the formula is seen, but it can't change the formula itself.
The dev responsible for BAM needs to make sure chemical formulas can't be internationalized.
Moving this issue to the BAM repo.
@ariel-phet had this on high priority in the Rosetta repo, so I'm going to put it on high here.
This... looks fixed on master? Running with ?locale=fa
locally:
@arouinfar thoughts on what to do?
@jonathanolson the subscripts and ordering of elements in a chemical formula look ok in master. However, there are still some serious issues. Coefficients are not being properly handled, and a string like Goal: 4H2
is being rearranged oddly into 4: goal H2
. The primary learning goal of the Multiple screen if for students to figure out the meaning of coefficients and how they differ from subscripts. The current localization makes that impossible. The location of the parentheses is also really messed up, but I think that's a recurrent (and annoying) issue with RTL localization.
en | fa |
---|---|
![]() |
![]() |
Both of those issues (the separation and the parenthesis) should both be very solvable issues, I'll look into it!
It looks like a combination of two ways embedding marks were missing:
So basically:
After fixes:
For reference, a single string change was:
Before:
After:
202a: Start LTR mark 202b: Start RTL mark 202c: End (one of the above) mark
I'm lazily using a tool I made at https://jonathanolson.net/shaping/examples/bidi-test.html (where I can copy-paste in any string to see the embedding marks and logical structure easily)
@jbphet I was under the impression that rosetta added RTL marks by default around RTL language strings. Is this incorrect now?
It is not incorrect that I know of. Rosetta is definitely designed to add the RTL embedding marks. There has been some work on the related file recently, done by @muedli, so I'll ask him to investigate if any of the changes he made may have broken this functionality.
@jbphet can these be moved to master in babel? Should they have "edit" JSON structures associated with them?
You should be able to safely move them to babel, and you do NOT want to have any sort of an edit trail with your name on it, otherwise you'll end up being credited as a translator on the website. I would suggest just making the change to the existing file with no edit log entry. Note that this will not trigger a rebuild of the translated version. If you'd like to try manually triggering a build, there are instructions in https://github.com/phetsims/rosetta/blob/master/doc/admin-guide.md, or you can just ask me or @muedli to do it once you've committed your changes to master.
Maybe sometime we (@arouinfar and @jbphet) could discuss the current strategy for handling RTL marks.
I'd be up for it. I haven't done anything on RTL translation for quite some time, and we haven't had any other reports of problems, so I'd like to better understand why it turned out to be an issue for this particular sim.
I'm assigning this back to @jonathanolson to see my responses continue the dialog if needed, and to @muedli to investigate whether some of the recent changes to translate-sim.js
may have inadvertently affected the addition of RTL embedding marks.
Looks awesome @jonathanolson!
If there's something that needs more discussion, I'd be happy to join in.
Pushed to master in babel, can we trigger a translation build @muedli?
Additionally, I think it's most likely necessary to wrap embedded LTR values (like the molecular formulas) with embedding marks as a general development practice. It might be good to discuss this briefly sometime @jbphet to get your opinion.
I need to push a small fix to master before I can use localhost to trigger a build.
Note to Self: https://github.com/phetsims/rosetta/commit/eadd764b7196f7c5925deed68393b06e52101e9a is the commit you're looking for.
I've been going through old GitHub issues and I realized this one fell off my radar. I triggered two different build requests for BAM in the fa locale around 10 AM. (Shouldn't the new builds be up by now?) The build server logs seem to indicate the builds succeeded. I'm still seeing:
when I go to https://phet.colorado.edu/sims/html/build-a-molecule/latest/build-a-molecule_fa.html.
I switched browsers and used a private window, so I'm fairly certain it's not a caching issue.
@jbphet and I were working on Rosetta 2.0 today and we discovered that Rosetta 1.0 only adds embedding marks when a translation is tested.
Rosetta 1.0 does not add embedding marks when a translation is submitted, to the best of my and @jbphet's knowledge. Perhaps that explains what I was seeing in https://github.com/phetsims/build-a-molecule/issues/220#issuecomment-1152653660?
@jbphet is going to look into this when he gets a chance.
I've done some investigation on this and, while I don't have a solution yet, I have some information that should help us work towards one. Here are some notes:
collectionSinglePattern
string.collectionSinglePattern
does not seem to include the RTL embedding marks, but when I log the string in hex from the running code, it does, but only for RTL languages that include a translation of this string pattern. This makes me think that the embedding marks are being added when the strings are generated, which doesn't seem quite right.For reference, here is the debug code that I added to the BAM class SingleCollectionNodeBox
:
And here is the output of the debug code when run on the fa
translation, which exhibits the problem:
------------------
SingleCollectionBoxNode.js? [sm]:30 box.moleculeType.commonName = water
SingleCollectionBoxNode.js? [sm]:31 box.moleculeType.molecularFormula = H2O
SingleCollectionBoxNode.js? [sm]:36 titleString = H<sub>2</sub>O (آب)
SingleCollectionBoxNode.js? [sm]:44 convertToHex( box.moleculeType.getGeneralFormulaFragment() ) = 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f
SingleCollectionBoxNode.js? [sm]:45 convertToHex( BuildAMoleculeStrings.collectionSinglePattern ) = 202b 7b 7b 67 65 6e 65 72 61 6c 7d 7d a0 28 7b 7b 64 69 73 70 6c 61 79 7d 7d 29 202c
SingleCollectionBoxNode.js? [sm]:46 convertToHex( box.moleculeType.getDisplayName() ) = 202b 622 628 202c
SingleCollectionBoxNode.js? [sm]:47 convertToHex( titleString ) = 202b 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f a0 28 202b 622 628 202c 29 202c
Here is the output from the iw
version, which does not have the problem.jj
------------------
SingleCollectionBoxNode.js? [sm]:30 box.moleculeType.commonName = water
SingleCollectionBoxNode.js? [sm]:31 box.moleculeType.molecularFormula = H2O
SingleCollectionBoxNode.js? [sm]:36 titleString = H<sub>2</sub>O (מים)
SingleCollectionBoxNode.js? [sm]:44 convertToHex( box.moleculeType.getGeneralFormulaFragment() ) = 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f
SingleCollectionBoxNode.js? [sm]:45 convertToHex( BuildAMoleculeStrings.collectionSinglePattern ) = 202a 7b 7b 67 65 6e 65 72 61 6c 7d 7d 20 28 7b 7b 64 69 73 70 6c 61 79 7d 7d 29 202c
SingleCollectionBoxNode.js? [sm]:46 convertToHex( box.moleculeType.getDisplayName() ) = 202b 5de 5d9 5dd 202c
SingleCollectionBoxNode.js? [sm]:47 convertToHex( titleString ) = 202a 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f 20 28 202b 5de 5d9 5dd 202c 29 202c
As noted above, it looks like the difference is in the RTL embedding marks for the pattern.
@jonathanolson - Are RTL embedding marks being added in the string generation stage? If so, do you think that not adding them to patterns might fix this issue? It seems to me that it is creating a bit of a "double negative" sort of situation.
Once you've reviewed the notes above and commented, let me know if you'd like to meet to discuss. I'm happy to do so. This is marked as high priority because we'd really like to get it worked out before Rosetta 2.0 is deployed, which is imminent.
@JacquiHayes reported that this problem exists in the Dari fa_DA
locale, but not the similar Pashto ps
locale over in https://github.com/phetsims/joist/issues/973.
Looks like we have a lot of issues related to this. Patched in https://github.com/phetsims/chipper/issues/1355.
I believe that is resolving it the "correct" way (putting explicit LTR marks around embedded LTR content).
Got an email into phethelp:
Attached screenshot:![rtl-issue](https://user-images.githubusercontent.com/60749003/103816681-9e4eb300-5022-11eb-9885-f34e565d194e.png)
Assigning @ariel-phet for prioritization.