Open JeromeChignoli opened 3 days ago
i found {système} keys in TRADING_FROM and TRADING_TO sections.
Looks like the translator has translated the variable name, which is a big no-no. Anything within {...}
should not be translated.
Would be most excellent if you could fix it: https://wiki.pioneerspacesim.net/wiki/Translations
The real work will be to find the files... maybe by grep'ing the bad entries
Files are in data/lang/**/fr.json
'
EDIT: Note: the changes should be done in transifex.
Ok. Easy for french translations, but i will have to compare {} keys in the *.json for the others languages. I assume the en.json files are the correct ones, from which i can start ? A lot of sed, grep, awk and sort commands... i like sort ;)
i will have to compare {} keys in the *.json for the others languages.
Well, variable names are always in English, so if you see any in French, that needs fixing.
Simplest is to just identify when/who made the error and check those. I don't know if that feature exist in Transifex, but either way, you can do a git log on the french files, and identify when the change was made (recently, I'd venture to guess). You can see all commits made that transfer changes from Transifex to master: https://github.com/pioneerspacesim/pioneer/commits?author=pioneer-transifex
Also, we intend to do a release within a few days, so if you have time to play around with this in the not too distant future that would be great.
For anyone tagging along in the conversation, but not interested in running command in OP, result is below. Since we have 31 languages all occurrences should be multiples of 31, unless there's a parsing error, (which it looks like some lines are, where the whole sentence has been included).
E.g. I was curious about {body(r)}
(the 4th hit in the output below) and it's is from Danish:
git grep -in "body(r)"
core/da.json:1460: "message": "Stabilt system med %bodycount større %{body(r)} og %portcount %{starport(e)}"
going to line 1460 in en.json:
"message": "Stable system with %bodycount major %{body(s)} and %portcount %{starport(s)}"
Output from command in OP:
1 {alcance}
1 {Alcancemáximo}
1 {area). Potrzebujesz hipernapędu zdolnego pokonać odległość {dist}
1 {body(r)}
1 {cach}
1 {cache}
1 {cash na misję w {area}
1 {combustível}
1 {companhia}
1 {corporação}
1 {dano}
1 {datas}
1 {data_vencimento}
1 {date|}
1 {datum}
1 {distans}
1 {equipamento}
1 {equipement}
1 {f-2}
1 {komodity}
1 {local}
1 {metallegering}
1 {missione}
1 {Name}
1 {newpaper}
1 {orbit_peiod}
1 {Percentualreparo}
1 {playedrShipLabel}
1 {playershipbalel}
1 {population}
1 {preço}
1 {primary}
1 {propietor}
1 {randezvous}
1 {{secondary_info}, tertiary_info}
1 {Setorz}
1 {shiplabel的星舰因燃料耗尽停在附近。 我们将向任何为该船加油的人支付{cash}
1 {sim}
1 {starport(e)}
1 {systém}
1 {systeme}
1 {system] ({sectorx}
1 {systemu}
1 {systému}
1 {tak}
1 {taxa_anuidade}
1 {valor}
1 {základen}
1 {星球}
1 {现金}
1 {系统}
1 {行星}
2 {areax}
2 {carportame}
2 {combustível_militar}
2 {naam}
2 {nein}
2 {nome da nave}
2 {nome do clube}
2 {quantidade}
2 {registro da nave}
2 {secretary}
2 {sectionx}
2 {todo{ ton {cargotype}
3 {grupo}
3 {proprietário}
3 {système}
4 {base de lançamento}
4 {Org}
4 {radioativos}
5 {carga}
5 {jornal}
5 {quantia}
6 {geld}
8 {cliente}
10 {porto estelar}
13 {doelwit}
20 {distância}
28 {data}
29 {bases}
29 {body(s)}
29 {shipname}
29 {shipregid}
29 {starport(s)}
29 {SYSTEM}
30 {alloy}
30 {company}
30 {corp}
30 {damage}
30 {deliver_crew}
30 {expiry_date}
30 {fuel}
30 {maxRange}
30 {membership_fee}
30 {money}
30 {orbit_period}
30 {range}
30 {rendezvous}
30 {repairPercent}
30 {tertiary_info}
30 {value}
31 {age}
31 {bodies}
31 {body}
31 {bodycount}
31 {countdown}
31 {coverage}
31 {crime}
31 {d08}
31 {destination}
31 {distance}
31 {duration}
31 {engineering}
31 {experience}
31 {faction}
31 {faction_police}
31 {general}
31 {high_gravity}
31 {level}
31 {minutes}
31 {N}
31 {navigation}
31 {newAmount}
31 {overall}
31 {parent_body}
31 {percent}
31 {piloting}
31 {portcount}
31 {potentialCrewMember}
31 {press}
31 {pressure}
31 {resolution}
31 {response}
31 {save}
31 {seconds}
31 {sensors}
31 {ship_label}
31 {tech_level}
31 {total_docking_pads}
31 {type}
31 {units}
31 {x}
31 {y}
31 {z}
32 {path}
34 {sectorX}
34 {sectorY}
34 {sectorZ}
36 {nome}
38 {setorz}
39 {sector}
39 {setorx}
39 {setory}
44 {alvo}
45 {dinheiro}
58 {spaceport}
59 {military_fuel}
60 {}
60 {FACTION}
60 {POLICE}
61 {entity}
61 {problem}
61 {reward}
61 {star}
61 {time}
62 {bay}
62 {dom_sectorx}
62 {dom_sectory}
62 {dom_sectorz}
62 {dom_system}
62 {grav}
62 {pass}
62 {shipid}
62 {wage}
66 {sistema}
90 {group}
90 {MILITARY}
91 {cargotype}
91 {commodity}
91 {location}
91 {quantity}
91 {ship}
92 {secondary_info}
93 {crew}
121 {clubname}
121 {dom_starport}
122 {mass}
123 {f.0}
123 {fee}
123 {mission}
124 {f.3}
124 {lat}
124 {long}
148 {no}
148 {radioactives}
149 {newspaper}
150 {proprietor}
155 {f.2}
155 {gravity}
174 {yes}
184 {drive}
184 {todo}
185 {f.1}
186 {done}
186 {lasttime}
186 {targetname}
217 {due}
246 {primary_info}
246 {shiplabel}
269 {client}
272 {cargo}
274 {org}
341 {unit}
395 {systembody}
401 {playerShipLabel}
461 {planet}
462 {equipment}
523 {price}
587 {station}
615 {amount}
620 {cargoname}
954 {area}
990 {locality}
1237 {target}
1242 {date}
1550 {dist}
1563 {starport}
2054 {sectorx}
2056 {sectory}
2061 {sectorz}
2715 {name}
2943 {cash}
4515 {system}
173045
OK, there are 149 files that have errors lua-variable translation errors (I'll update this post when I have the C++ ones as well):
nice work. Did you work from the shell command output ? There are typos on the « } » tags (as in the 3rd & 7th outputs of the shell command). By luck, a second {key} was present so first tag is printed. There may be typos on « { » tags, or on « } » tags without second {key}: these will not be displayed by the shell command.
@JeromeChignoli I wrote a python script this morning. I should be able to extend it to just fix all the errors, and we'll push it to transifex directly, so you don't need to do anything.
Nice. But be warned: tools are great but stupid… ;)
so you don't need to do anything.
Right, sorry! I should have read the whole thread, obviously, before manually fixing strings. I solved some 4-5 cases before noticing this part. I hope this doesn't create any problems for the script.
before manually fixing strings
@zonkmachine no worries. You might have been on to something...
But be warned: tools are great but stupid… ;)
@JeromeChignoli Indeed. See below:
I've done some more tinkering, and looked more at the output, and reached the conclusion: there be complications. the TL;DR, I could just compile a list of errors from each language and send that out to the translators, and "pretty please fix"-them, or we just brute force the fixes, and nuke 95% of the errors, in my guesstimation.
Below are some edge cases that make it challenging to (correctly) automate fixing in script.
...to quote Wolfgang Pauli. This is going to be tough to fix in script
Checking file: data/lang/module-searchrescue/pl.json FAIL:
ERROR: {'RESULT_DELIVERY_COMM': ['{done}', '{todo{ ton {cargotype}']}
correct: {'RESULT_DELIVERY_COMM': ['{done}', '{todo}', '{cargotype}']}
ERROR: {'RESULT_PICKUP_COMM': ['{done}', '{todo{ ton {cargotype}']}
correct: {'RESULT_PICKUP_COMM': ['{done}', '{todo}', '{cargotype}']}
erros: 2, warnings: 0
or here }
vs )
:
Checking file: data/lang/module-combat/pl.json FAIL:
ERROR: {'POLICE_2': ['{org}', '{cash}', '{system}', '{sectorx}', '{sectory}', '{sectorz}', '{area). Potrzebujesz hipernapędu zdolnego pokonać odległość {dist}']}
correct: {'POLICE_2': ['{org}', '{cash}', '{system}', '{sectorx}', '{sectory}', '{sectorz}', '{area}', '{dist}']}
or missing closing bracket all together:
Checking file: data/lang/module-combat/pl.json FAIL:
ERROR: {'POLICE_1': ['{area}', '{system}', '{sectorx}', '{sectory}', '{sectorz}', '{dist}', '{cash na misję w {area}']}
correct: {'POLICE_1': ['{area}', '{system}', '{sectorx}', '{sectory}', '{sectorz}', '{dist}', '{cash}', '{area}']}
maybe above warrants some regex magic to detect? (all three examples from polish).
Below, the percentage sign is lacking a space, thus being interpreted as a C-variable string by my script (variables from C++ code:foo %my_variable bar
, variables from Lua code: foo {my_variable} bar
), and we have several of these from Chinese (Mandarin?). I assume this will not be a problem in the game (unless this happens to be a string where we're trying to substitute another variable from C++, I suspect)
Checking file: data/lang/ui-core/ja.json FAIL:
ERROR: {'YOUR_HULL_IS_AT_X_INTEGRITY': ['{value}', '%の強度です']}
correct: {'YOUR_HULL_IS_AT_X_INTEGRITY': ['{value}']}
Below is probably fine (thus script only considers it as "warning"), but I if the variables also had been wrong (e.g. translated), then I would replace them assuming the order would be the same as in the English string. This is not an issue when the string only has a single {variable}
, but most strings (with errors at least) contain multiple variables. I don't see a way around this without manual intervention (or some fancy-schmancy stuff as I suggest in "conclusions" section below).
Checking file: data/lang/module-system/ru.json FAIL:
WARNING: {'DISCOVERED_HIDDEN_BASES': ['{bases}', '{portcount}']}
correct: {'DISCOVERED_HIDDEN_BASES': ['{portcount}', '{bases}']}
erros: 0, warnings: 1
(Seems like like German (to no surprise) and Russian often swaps order of variables)
Below, number of strings is mismatched between English and translated. This might be because the string is foo
instead of {foo}
or %foo
, thus the script isn't (currently) detecting the string (it only keeps strings with variable substitutions in them).
Checking file: data/lang/ui-core/ar.json FAIL:
ERROR: mismatching keys, probably missing required {variable} all together
Checking file: data/lang/ui-core/hu.json FAIL:
ERROR: mismatching keys, probably missing required {variable} all together
Checking file: data/lang/module-taxi/id.json FAIL:
ERROR: mismatching keys, probably missing required {variable} all together
I think 2.3 is a very (imagined?) edge case, and we could just ignore it, I think the script would do the right thing 95 % of the time, and the translations would be less wrongier than before (but now the error might be so subtle as to be not obviously noticeable in game play, making the translation "bug" more deceptive?).
One could also measure some kind of "distance" (character Hamming distance, or word2vec embedding space, or LLM) between the wrong and correct variable, for most cases the wrong and correct are quite close, either because they're:
{sextorx}
, {sextory}
, {sextory}
)As teased in the Introduction, we could ask translators to do the work for us, and we'll see where that gets us. (Also: I'm yet to figure out how to implement the regex string subsitution from the list of correct variables)
Said differently: only about 5% of the messages need a human correction. A tool displaying the correct and the incorrect strings side by side, highlighting detected {keys} and « {, } or % » tags, english key_name…
the TL;DR, I could just compile a list of errors from each language and send that out to the translators, and "pretty please fix"-them, or we just brute force the fixes, and nuke 95% of the errors, in my guesstimation.
I would rather inform the translators.
Maybe it is too late, i writed a bash script (sorry to not be python aware). I dont know if it may be of some utility, it is slow, ugly writed, it cant log colors, but it seems to highlight some more ways to fix in the console. [findTheBadKeys.sh.txt]
EDIT: Ooops wrong buggy script. updated findTheBadKeys2.sh.txt
Hello.
I was having nil values in trade computer. Searching from source code i found bad keys in lang/ui.core/fr.json: i found {système} keys in TRADING_FROM and TRADING_TO sections. Assuming it was auto-translation issue (as système is system), i was curious to see if it was the only one...
in lang directory:
find -name '*.json' -exec sed -n '/[}{]/{s/^[^{]*//;s/[^}]*$//;s/}[^{]*{/}\n{/g;p}' {} \;|sort|uniq -c|sort -n
Interesting results with my 20240314 (8b6ae92) on Linux version: a lot of auto-translated keys and some typos. The real work will be to find the files... maybe by grep'ing the bad entries
EDIT: by "auto-translation" I mean "automatic translation". Tools are great but stupid.