phetsims / rosetta

PhET's Simulation Translation Utility
MIT License
3 stars 1 forks source link

Shared string count in Concentration #390

Closed oliver-phet closed 1 year ago

oliver-phet commented 1 year ago

I'm using new translation utility on PhET, however the statistics number of translated strings on some simulations are strange and wrong. Like this one: https://phet.colorado.edu/translate/zh_TW/concentration

@liammulh I'm still not fully grasping how shared strings are handled. There are 31 shared strings available to translate when viewing Concentration, but the statistics (zh_TW) says "Shared Strings: 88% (38 of 43)"

This is confusing to translators as well. Is there an issue counting the shared strings correctly? Or are some hidden?

liammulh commented 1 year ago

There are 31 shared string in Concentration coming from Beers Law Lab. In BLL, there are 43 strings.

When writing the code, I thought that if sim A and sim B share strings, and if we already got the shared string stats for sim A, then those stats could be used for sim B. What I didn't realize at the time was that sim B might only use a strict subset of sim A's strings.

Here is the relevant code:

https://github.com/phetsims/rosetta/blob/a3629a72bc161c0ef6ed136bf338f8e9df05c725/src/server/translationApi/translationReport/getTranslationReportObject.js#L122-L127

liammulh commented 1 year ago

There is another problem with the way shared strings are computed. The number of shared strings for a sim is simply categorizedStringKeys.shared.length (just the string keys in the sim). It is being incorrectly set:

https://github.com/phetsims/rosetta/blob/a3629a72bc161c0ef6ed136bf338f8e9df05c725/src/server/translationApi/translationReport/getTranslationReportObject.js#L146-L148

The above code sets numSharedStrings for the last sim in the sharedSims array that we loop over here:

https://github.com/phetsims/rosetta/blob/a3629a72bc161c0ef6ed136bf338f8e9df05c725/src/server/translationApi/translationReport/getTranslationReportObject.js#L120

Thus, for Concentration, the number of shared strings is 43 regardless of whether we remove the cachedSharedObject conditional I linked to in https://github.com/phetsims/rosetta/issues/390#issuecomment-1496298745.

liammulh commented 1 year ago

I have a patch that removes the conditional linked in https://github.com/phetsims/rosetta/issues/390#issuecomment-1496298745 and fixes the problem with the computation of numSharedStrings as described in https://github.com/phetsims/rosetta/issues/390#issuecomment-1496444463. I believe the shared strings are being correctly counted in this patch:

concentration bll
liammulh commented 1 year ago

Patch added in https://github.com/phetsims/rosetta/commit/21064475bb45c03613453e88c09a99178c6e262f. Marking as fixed-awaiting-deploy.

liammulh commented 1 year ago

Another few test cases:

liammulh commented 1 year ago

I'm going to do a bit more testing before deploy.

liammulh commented 1 year ago

When I downloaded the HTML files, here are the stats:

When I looked at the patched translation utility:

(MAL and Greenhouse seemingly don't share strings.)

liammulh commented 1 year ago

@oliver-phet, this fix is live. Please thank the translator and close this issue when you are done.

oliver-phet commented 1 year ago

@liammulh the zh_TW translator is still seeing the same issue on 3 sims: Masses and Springs: Basics Molecule Shapes: Basics Molecule Shapes

All 3 seem to have 2 shared strings that aren't available to translate, but are included in the info statistics (all strings are translated). Screenshot_20230408_081345

liammulh commented 1 year ago

This is done. See https://github.com/phetsims/molecule-shapes/issues/252.