phetsims / joist

Joist is the main framework for PhET Interactive Simulations. Joist creates and displays the simulation content, home screen, navigation bar, About dialog, enables switching between tabs, and other framework-related features.
http://scenerystack.org/
MIT License
9 stars 6 forks source link

"Unused" translated strings are leaking into simulations #912

Closed zepumph closed 1 year ago

zepumph commented 1 year ago

@arouinfar @kathy-phet and I discovered over in https://github.com/phetsims/phet-io/issues/1913 that there are "used strings" in simulations that aren't "actually" used by simulations. And so these strings are provided to translators.

Here are the examples we found right away:

  1. The Helper uses a measuring tape, so now the string "Measuring Tape" is provided to all sims for translating
  2. The KeyboardHelpDialog is imported in joist, so many keyboard-related strings are in all sims, even if there isn't a keyboard help dialog in that sim.
  3. Same for strings from the PreferencesDialog (like voicing.titleEnglishOnlyStringProperty)

This means that rosetta is providing these strings to translators. This also means they are showing up in the PhET-iO Studio tree for cusomizability. Not great!

I'd love to hear from @jonathanolson and @samreid about this if you don't mind.

@kathy-phet would like this to be blocking until we understand more. It is not good to have sims going out with these extra strings.

kathy-phet commented 1 year ago

Adding @liammulh and @jbphet so they know that this is the current situation, and strings might need to be removed from the database for newly published sims once this is fixed.

zepumph commented 1 year ago

At the heart of this seems to be that we have an algorithm for detecting "used" strings that may no longer be viable. It uses a regex that parses through the entirety of our module system to determine what strings are in the code.

https://github.com/phetsims/chipper/blob/05638b7dfd0ea5fa29488cb2d708828712b015e9/js/grunt/getStringMap.js#L177

Perhaps we should recognize that this regex has taken us as far as it can.

samreid commented 1 year ago

Brainstorming an alternative: Each LocalizedString keeps track of whether it has been accessed at runtime, like so:

```diff Subject: [PATCH] Only run precommit hooks in master, see https://github.com/phetsims/perennial/issues/276 --- Index: js/LocalizedString.ts IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/LocalizedString.ts b/js/LocalizedString.ts --- a/js/LocalizedString.ts (revision bad1ffa9226fc5f37bff7691764f785ca3063f2f) +++ b/js/LocalizedString.ts (date 1677713111310) @@ -6,7 +6,7 @@ * @author Jonathan Olson */ -import DynamicProperty from '../../axon/js/DynamicProperty.js'; +import DynamicProperty, { DynamicPropertyOptions, TNullableProperty } from '../../axon/js/DynamicProperty.js'; import TinyProperty from '../../axon/js/TinyProperty.js'; import TinyOverrideProperty from '../../axon/js/TinyOverrideProperty.js'; import localeProperty, { Locale } from '../../joist/js/i18n/localeProperty.js'; @@ -18,6 +18,7 @@ import { localizedStrings } from './getStringModule.js'; import arrayRemove from '../../phet-core/js/arrayRemove.js'; import TandemConstants, { PhetioID } from '../../tandem/js/TandemConstants.js'; +import TReadOnlyProperty from '../../axon/js/TReadOnlyProperty.js'; // constants const FALLBACK_LOCALE = 'en'; @@ -64,7 +65,7 @@ this.localeOrderListener = this.onLocaleOrderChange.bind( this ); localeOrderProperty.lazyLink( this.localeOrderListener ); - this.property = new DynamicProperty( localeProperty, { + this.property = new TrackingDynamicProperty( localeProperty, { derive: ( locale: Locale ) => this.getLocaleSpecificProperty( locale ), bidirectional: true, phetioValueType: StringIO, @@ -237,6 +238,24 @@ } } +class TrackingDynamicProperty extends DynamicProperty { + private accessed = false; + + public constructor( valuePropertyProperty: TNullableProperty | TReadOnlyProperty, providedOptions?: DynamicPropertyOptions ) { + super( valuePropertyProperty, providedOptions ); + } + + public override get value(): ThisValueType { + this.accessed = true; + return super.value; + } + + public override get(): ThisValueType { + this.accessed = true; + return super.get(); + } +} + chipper.register( 'LocalizedString', LocalizedString ); export default LocalizedString; ```

Then we run a puppeteer step like we do for phet-io to report which strings are actually used by sims? Would we need to fuzz?

jonathanolson commented 1 year ago

I worry about the "see what strings are accessed". That would break for things like dialogs (which might only access strings when they are created later).

jonathanolson commented 1 year ago

The concept of "used" strings is tricky. For instance in the Helper, if it didn't mark the string as "used", then using the helper in a built sim would break. So that... seems correct.

Perhaps we want to instead think about "which strings do we want translators to see for specific sims"?

jonathanolson commented 1 year ago

Also, puppeteer steps have caused untold troubles and have slowed me down many weeks of time. I'd like to caution against relying more on puppeteer.

samreid commented 1 year ago

Maybe each sim should explicitly list which dependency strings it allows itself to use (in package.json or nearby)? Then we have that static list and can use it for builds + rosetta + phet-io. If a sim tries to use a string outside of its list, it is a runtime error.

jonathanolson commented 1 year ago

I'm also a bit worried about having to add strings to lists whenever we changing something in a sim. "Every time I add a stopwatch to a sim, I have to add a few very specific lines to a very specific file" sounds troublesome. "I added a common string, and I now need to add a line to a bunch of files, but not all of the files" sounds more troublesome.

samreid commented 1 year ago

Let us consider the possibility that there is no low-complexity low-overhead way to track the exact strings used in each sim. Maybe a good next step would be a short meeting with stakeholders to qualify the importance of things, and discuss how to proceed.

zepumph commented 1 year ago

Yes I agree. I have heard strongly from @kathy-phet that this is not acceptable, but perhaps the first step is to discuss the problem more thoroughly to decide on if this needs to block. We can discuss this at PhET-iO meeting this afternoon.

jbphet commented 1 year ago

@kathy-phet said above:

Adding @liammulh and @jbphet so they know that this is the current situation, and strings might need to be removed from the database for newly published sims once this is fixed.

We could certainly do this, but strictly speaking it isn't necessary. Most of the strings discussed above are in common code, so while it may be unnecessary for them to be translated for a given sim, there doesn't seem to be a compelling reason to remove the translation if and when we have a more accurate way of evaluating which strings are used in a sim.

@zepumph said:

Perhaps we should recognize that this regex [that detects which strings are used] has taken us as far as it can.

Yes, unfortunately, this is probably true. It was a good, simple solution that lasted us for quite a while, but the sims have increased in complexity, and we will likely need to find something better.

I'm going to unassign myself for now since I don't think there is anything at this time that is actionable on my part, but I'd be happy to take the lead on this if needed or participate in brainstorming sessions on how to improve our ability to detect which strings are used in a sim.

zepumph commented 1 year ago

We discussed this with @kathy-phet @samreid @jonathanolson @pixelzoom @arouinfar @zepumph today and decided that a development solution for this is not tenable. Instead we will work with the translator onboarding team to make sure they are aware of this "inconsistency".

Assigning @solaolateju @RVieyra @DianaTavares. Please reach out to @kathy-phet to discuss the implications of some of these unused strings being served to translators in rosetta.

RVieyra commented 1 year ago

Ok, I've opened a Slack space for Diana, Sola, and me to discuss this with Kathy.

zepumph commented 1 year ago

Sounds good! I'm going to close, please let us know if there is anything else we can do to help.