Closed liammulh closed 3 months ago
@arouinfar should also consult on this, since she speaks Persian and has done some translations of PhET sims into this language.
Those chemical formulas are absolutely wrong. CO2
should always be CO2
, etc. Chemical formulas and chemical symbols are a universal standard, and should never be localized.
@muedli given the popularity of this sim, it would be great to fix this issue. Can you determine if this is a general rosetta problem or if this is sim specific?
This is a sim-specific problem. Here's a screenshot from Balancing Chemical Equations in Farsi:
I wasn't sure about this when I made this issue, but I'm now fairly certain that Rosetta doesn't have control over the content of formulas. Rosetta can be used to change the order in which the formula is seen, but it can't change the formula itself.
The dev responsible for BAM needs to make sure chemical formulas can't be internationalized.
Moving this issue to the BAM repo.
@ariel-phet had this on high priority in the Rosetta repo, so I'm going to put it on high here.
This... looks fixed on master? Running with ?locale=fa
locally:
@arouinfar thoughts on what to do?
@jonathanolson the subscripts and ordering of elements in a chemical formula look ok in master. However, there are still some serious issues. Coefficients are not being properly handled, and a string like Goal: 4H2
is being rearranged oddly into 4: goal H2
. The primary learning goal of the Multiple screen if for students to figure out the meaning of coefficients and how they differ from subscripts. The current localization makes that impossible. The location of the parentheses is also really messed up, but I think that's a recurrent (and annoying) issue with RTL localization.
en | fa |
---|---|
Both of those issues (the separation and the parenthesis) should both be very solvable issues, I'll look into it!
It looks like a combination of two ways embedding marks were missing:
So basically:
After fixes:
For reference, a single string change was:
Before: After:
202a: Start LTR mark 202b: Start RTL mark 202c: End (one of the above) mark
I'm lazily using a tool I made at https://jonathanolson.net/shaping/examples/bidi-test.html (where I can copy-paste in any string to see the embedding marks and logical structure easily)
@jbphet I was under the impression that rosetta added RTL marks by default around RTL language strings. Is this incorrect now?
It is not incorrect that I know of. Rosetta is definitely designed to add the RTL embedding marks. There has been some work on the related file recently, done by @muedli, so I'll ask him to investigate if any of the changes he made may have broken this functionality.
@jbphet can these be moved to master in babel? Should they have "edit" JSON structures associated with them?
You should be able to safely move them to babel, and you do NOT want to have any sort of an edit trail with your name on it, otherwise you'll end up being credited as a translator on the website. I would suggest just making the change to the existing file with no edit log entry. Note that this will not trigger a rebuild of the translated version. If you'd like to try manually triggering a build, there are instructions in https://github.com/phetsims/rosetta/blob/master/doc/admin-guide.md, or you can just ask me or @muedli to do it once you've committed your changes to master.
Maybe sometime we (@arouinfar and @jbphet) could discuss the current strategy for handling RTL marks.
I'd be up for it. I haven't done anything on RTL translation for quite some time, and we haven't had any other reports of problems, so I'd like to better understand why it turned out to be an issue for this particular sim.
I'm assigning this back to @jonathanolson to see my responses continue the dialog if needed, and to @muedli to investigate whether some of the recent changes to translate-sim.js
may have inadvertently affected the addition of RTL embedding marks.
Looks awesome @jonathanolson!
If there's something that needs more discussion, I'd be happy to join in.
Pushed to master in babel, can we trigger a translation build @muedli?
Additionally, I think it's most likely necessary to wrap embedded LTR values (like the molecular formulas) with embedding marks as a general development practice. It might be good to discuss this briefly sometime @jbphet to get your opinion.
I need to push a small fix to master before I can use localhost to trigger a build.
Note to Self: https://github.com/phetsims/rosetta/commit/eadd764b7196f7c5925deed68393b06e52101e9a is the commit you're looking for.
I've been going through old GitHub issues and I realized this one fell off my radar. I triggered two different build requests for BAM in the fa locale around 10 AM. (Shouldn't the new builds be up by now?) The build server logs seem to indicate the builds succeeded. I'm still seeing:
when I go to https://phet.colorado.edu/sims/html/build-a-molecule/latest/build-a-molecule_fa.html.
I switched browsers and used a private window, so I'm fairly certain it's not a caching issue.
@jbphet and I were working on Rosetta 2.0 today and we discovered that Rosetta 1.0 only adds embedding marks when a translation is tested.
Rosetta 1.0 does not add embedding marks when a translation is submitted, to the best of my and @jbphet's knowledge. Perhaps that explains what I was seeing in https://github.com/phetsims/build-a-molecule/issues/220#issuecomment-1152653660?
@jbphet is going to look into this when he gets a chance.
I've done some investigation on this and, while I don't have a solution yet, I have some information that should help us work towards one. Here are some notes:
collectionSinglePattern
string.collectionSinglePattern
does not seem to include the RTL embedding marks, but when I log the string in hex from the running code, it does, but only for RTL languages that include a translation of this string pattern. This makes me think that the embedding marks are being added when the strings are generated, which doesn't seem quite right.For reference, here is the debug code that I added to the BAM class SingleCollectionNodeBox
:
And here is the output of the debug code when run on the fa
translation, which exhibits the problem:
------------------
SingleCollectionBoxNode.js? [sm]:30 box.moleculeType.commonName = water
SingleCollectionBoxNode.js? [sm]:31 box.moleculeType.molecularFormula = H2O
SingleCollectionBoxNode.js? [sm]:36 titleString = H<sub>2</sub>O (آب)
SingleCollectionBoxNode.js? [sm]:44 convertToHex( box.moleculeType.getGeneralFormulaFragment() ) = 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f
SingleCollectionBoxNode.js? [sm]:45 convertToHex( BuildAMoleculeStrings.collectionSinglePattern ) = 202b 7b 7b 67 65 6e 65 72 61 6c 7d 7d a0 28 7b 7b 64 69 73 70 6c 61 79 7d 7d 29 202c
SingleCollectionBoxNode.js? [sm]:46 convertToHex( box.moleculeType.getDisplayName() ) = 202b 622 628 202c
SingleCollectionBoxNode.js? [sm]:47 convertToHex( titleString ) = 202b 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f a0 28 202b 622 628 202c 29 202c
Here is the output from the iw
version, which does not have the problem.jj
------------------
SingleCollectionBoxNode.js? [sm]:30 box.moleculeType.commonName = water
SingleCollectionBoxNode.js? [sm]:31 box.moleculeType.molecularFormula = H2O
SingleCollectionBoxNode.js? [sm]:36 titleString = H<sub>2</sub>O (מים)
SingleCollectionBoxNode.js? [sm]:44 convertToHex( box.moleculeType.getGeneralFormulaFragment() ) = 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f
SingleCollectionBoxNode.js? [sm]:45 convertToHex( BuildAMoleculeStrings.collectionSinglePattern ) = 202a 7b 7b 67 65 6e 65 72 61 6c 7d 7d 20 28 7b 7b 64 69 73 70 6c 61 79 7d 7d 29 202c
SingleCollectionBoxNode.js? [sm]:46 convertToHex( box.moleculeType.getDisplayName() ) = 202b 5de 5d9 5dd 202c
SingleCollectionBoxNode.js? [sm]:47 convertToHex( titleString ) = 202a 48 3c 73 75 62 3e 32 3c 2f 73 75 62 3e 4f 20 28 202b 5de 5d9 5dd 202c 29 202c
As noted above, it looks like the difference is in the RTL embedding marks for the pattern.
@jonathanolson - Are RTL embedding marks being added in the string generation stage? If so, do you think that not adding them to patterns might fix this issue? It seems to me that it is creating a bit of a "double negative" sort of situation.
Once you've reviewed the notes above and commented, let me know if you'd like to meet to discuss. I'm happy to do so. This is marked as high priority because we'd really like to get it worked out before Rosetta 2.0 is deployed, which is imminent.
@JacquiHayes reported that this problem exists in the Dari fa_DA
locale, but not the similar Pashto ps
locale over in https://github.com/phetsims/joist/issues/973.
Looks like we have a lot of issues related to this. Patched in https://github.com/phetsims/chipper/issues/1355.
I believe that is resolving it the "correct" way (putting explicit LTR marks around embedded LTR content).
I just looked at the currently published Dari (Persian) version, and it appears that it is still broken. I looked at https://github.com/phetsims/build-a-molecule/issues/220 and found that KP had approved the maintenance release, but hadn't assigned the issue back to @jonathanolson, so I think the decision was missed. I just checked in with @jonathanolson over Slack, and he said he'll move forward with the MR now.
QA test in https://github.com/phetsims/qa/issues/1113.
@arouinfar it looks like other RTL translations aren't ordering things correctly... should I try to fix those up to be ordered like the fa
locale?
https://phet-dev.colorado.edu/html/build-a-molecule/1.0.22-rc.2/phet/build-a-molecule_all_phet.html is the link to test (or phettest, I just fixed the second screen also).
@arouinfar it looks like other RTL translations aren't ordering things correctly... should I try to fix those up to be ordered like the fa locale?
Yes, I think we should fix this in all RTL languages. The problem is with the coefficients used on the Multiple Screen, not the chemical formulas themselves.
Things are rendered correctly in fa
I tested a few other RTL locales, and things are messed up in different ways.
Pashto, locale=ps
Hebrew, locale=iw
Arabic, locale=ar Looks like the translator left the string in English to avoid layout issues.
I pushed up the MR to production, applied some patches to the strings, and kicked off a translation rebuild.
Locales that should be fixed up (extra query parameter to bypass cache):
However, ar_MA
and ps
locales have translation strings that seem to "reverse" the first-screen formulas (with the items in parentheses):
@arouinfar is that also something I should patch up? Also, can you check out the above links to see how they look?
@jonathanolson I spot-checked all of the links in https://github.com/phetsims/build-a-molecule/issues/220#issuecomment-2231480001 and the chemical formulas and coefficients in front of the formulas all look good.
However,
ar_MA
andps
locales have translation strings that seem to "reverse" the first-screen formulas (with the items in parentheses):
This looks acceptable to me. If I'm not mistaken the string in parentheses is the name of the chemical, in this case, water. It seems more like the translator's stylistic choice than a bug. I don't think we need to make any further changes.
Sounds good, I think that is everything for this issue, thanks!
Got an email into phethelp:
Attached screenshot:
Assigning @ariel-phet for prioritization.