Open Rangi42 opened 5 years ago
This is probably a hopeless endeavor.
When Ruby and friends were translated, they just had to replace strings. They relied on the compiler to place them in ROM, so only the strings themselves changed.
This is not the case for generations 1 and 2. Data was put into banks manually. Translations often caused banks to get full, leading to text (and maps, and everything) being moved around; that's the reason why text_far
exists at all, for instance.
If nothing was moved across banks for the translations, this is possible, but I very much doubt it. If stuff was moved around, this will quickly turn into either if
jungle or a pile of ad hoc pseudo-metadata files just to handle the bank allocation.
Maybe so. I'm leaving it open just like #285 for now, as a maybe-impossible goal. Even just separate repos for each language would be better than nothing.
I'll just leave my notes on this endeavour behind, here:
First of all, pokeruby's way of doing this is terribly wasteful. It uses rsync
to copy the data_de
directory on top of the regular directory. If we're going to do this, we should do the following:
make compare
-i
option to "overlay" included files. This should also be able to be used to include generated files for the proper language.I'm not entirely too sure what I'd make the directory structure look like, but I think top-level i18n/<lang>
directories would be the simplest and most effective.
This, however, poses a problem with how map scripts are laid out, which are a huge part of the translation. It'd be rather wasteful to have full copies of the map scripts where only the text changes, since that'd require propagating map script changes for 4 different languages, which is undesireable. However, having separate "_text.asm" files for each map sounds about as undesireable, so I'm not sure how to solve this. (I've been musing about a gettext-like system but it seems terribly impractical)
As for what @aaaaaa123456789 mentions, this method would make it a non-issue. Text banks would be entirely overlaid, causing the exact position of each text to stop mattering, and so would files that include others (data/maps/scripts.asm
and main.asm
, for example) as well as linker scripts.
It would be easier to reason about what duplication will be necessary or avoidable if we had at least one other language disassembled. Hint, hint…
Can the current build infrastructure handle multiple include directories? Specifically, will makefile dependencies be generated correctly?
No, it can't. scan_includes will need adaptation.
There are steps you could take to make i18n more plausible, without actually going ahead and implementing it. For example, adding run-time word-wrapping to the game, so that text lines do not have to split into separate directives. Being able to equate one string in the source with one string in the ROM would help with a gettext like system.
A low-tech possible solution is to replace strings in the source with with constants, and different include files are used to define the constants according to language. This way, maybe all strings for the game can be kept in a single file for each language.
@Kroc Additional features don't go well with the idea of making a matching ROM.
While, yes, having strings be macros or constants that get replaced for each different language would probably work, you have to keep in mind that text isn't the only thing that changes in the translated ROMs. Some code and graphics change as well, so we wouldn't be able to do much with just that.
It'd be the way I'd solve it, if I were the only person using this codebase, but the thing is that very few people are actually interested in working with translations, and any i18n changes should be burden-free for people who don't want them. Hence, adding yet another layer of indirection when defining strings sounds like a bad idea.
I'd rather have some kind of gettext-like system, or something that can not only overlay entire files but also just specific labels in a file. However, both of those solutions sound a bit too finnicky and hard to get right.
Unfortunately; these will probably be better off remaining as a Feature Branch. Although it is very much possible to build a single repo to support all the region releases, it would cause too much clutter in the repo. Although, I don't want mid-kid's effort to be completely wasted... so I'd be willing to help make them fully fleshed Feature Branches based on modern pokecrystal. We can then link them to the Wiki.
I disagree! The different localisations are easy enough to keep separate and non-intrusive. The only intrusive change being the build system changes required to make it work. I don't think that would be a blocker.
4,228 changed files with 290,729 additions and 71,353 deletions... is a lot.
It would be massively disruptive, and very difficult for downstream users to stay up to date with pokecrystal. In the proposed system it looks like you renamed most of the .asm
files to .inc
and are now including the .inc
in the new version changed files. The english files are still in the main data area.. where as the other versions are in a version area. Furthermore, the layout.link file is... something.... of a mess.
Is it feasible, sure... unfortunately I think we would have to hand hold every downstream repo through the update process if they want to maintain updateability. I don't see pushing this out an not getting many many confused downstream users. It would be a ton of work for them to get current, for a feature they probably never cared about.
Edit: A base branch or even possibly a patch branch would be much cleaner and better suited for this imo.
I'd point out that a net line change of +219,376 is a massive increase in repo size.
And I'd like to point out that the i18n repo currently is built upon the -splitting branch, most of the deletions stem from there. that is the disruptive change, not the i18n. i18n doesn't depend on -splitting, it was just built on it since back then I expected -splitting to get merged eventually. A net increase of +666,666 doesn't matter when it's all in separate directories and doesn't touch the regular english code all that much. ...and that's without mentioning that the i18n branch is out of date by a couple of years, the stats are massively skewed by that too, if you're comparing current master to the latest i18n commit. You're better off comparing i18n to -splitting.
Furthermore, the layout.link file is... something.... of a mess.
That's part of the -splitting changes, but, how so? It simply lists each file name like main.asm currently does.
In the proposed system it looks like you renamed most of the .asm files to .inc and are now including the .inc in the new version changed files.
That's not really the case. This is again, a -splitting change, you can read all of its gory details here, but to sum it up, the -splitting build system scans for files ending with .asm
, and calls rgbasm on those. this was done to give a better overview of what files can be built independently and increase (mostly incremental) build speeds, as well as it being tidier coming from a C background, since you know a .asm file will provide its whole context, while a .inc file is included from a different file and may inherit definitions and macros from whoever included it.
This doesn't have much to do with i18n, since .asm files in the version/ directory will be used instead of the regular files when a localization is built, as well, and the build system can be accomodated to a non-splitting environment.
Oh, and please note that the current if DEF()
for various strings in between the code was still under consideration for improvement, for example with a macro, I just didn't consider it a pressing decision to take at the time.
It was a proof of concept made to bounce ideas and spark discussion, not a final thing.
Comparing i18n to -splitting, is better... 1,612 changed files with 211,336 additions and 231 deletions.
if DEF()
's would begin to aggravate in repo's that already build multiple versions (pokered, pokegold). Yes, macro's may make this better.We could even make the branches on the pret repo. I'm not 100% positive.. but perhaps we can set up some clever CI to keep them up to date.
I'm just giving my thoughts on the topic. If we decide to press forward with this, then of course I'll help out in getting it done with whatever we decide to do. I just want to make sure we are taking the right approach to this.
I do see the benefit of having this in the main repo. There's many simple hacks like difficulty, speedchoice or gimmick hacks like enabling mobile adapter features that don't change much text and would instantly benefit from being able to be built in all languages; I've already seen people ask for a spanish version of the latter, and there's huge spanish communities that'd rather hack in their own language, without having to resort to outdated forks.
Making it easy for this to happen, while it still being trivial to remove support for localizations (i.e. delete the versions/ directory, or copy the files over for the localization you want to use), is IMO the best solution for this.
There is a lot of redundant code being added. (Example: each localization has its own map .asm file).
That's the only example, and a necessary evil unless you want to decouple map scripts from their text. This solution was chosen to minimize impact on people who only hack the english game.
I don't intend to port this to pokered or pokegold or whatever, but in case it ever happens, the IF DEFs won't overlap anyway so I consider that a moot point.
It shouldn't be a feature branch because it's not a feature branch, it's a base. Despite the relatively unobtrusive changes for english-only hackers, adding all the localized in-code strings is a fair bit of effort, the build system changes are non-trivial, and renaming and moving of text labels for e.g. battle features is easier to do up front than having to go back and port the texts later. Additionally, this is significantly easier to maintain here, add language-specific tutorials and bugs_and_glitches entries, and it completes the romset we're trying to reproduce.
The main point of contention is the duplication of map scripts imo, but I genuinely don't know of a better solution without pissing off a dozen people, and I think it's worth having despite that.
Also note that I'm not expecting you to do this, I'm fully expecting to do it myself (since I have the most experience with the changeset...), and am trying to find time to do it myself, but it'll take a while, it's not a small thing to port.
I don't intend to port this to pokered or pokegold or whatever, but in case it ever happens, the IF DEFs won't overlap anyway so I consider that a moot point.
All the Gen I/II repo's should share the same goals. It doesn't make sense to do something for one repo, and not the rest. If we can't support it on the other repo's we shouldn't do it here. Now, if it is a matter that you just don't have the time to do pokegold/pokered/pokeyellow, then it's fine.. as one of us (me?) would finish the work to port to them.
I do see the benefit of having this in the main repo. There's many simple hacks like difficulty, speedchoice or gimmick hacks like enabling mobile adapter features that don't change much text and would instantly benefit from being able to be built in all languages;
How many are actually doing this compared to majority of Rom Hackers? I'd imagine it is significantly less than those who are building generic rom-hacks based on the English localization. We would be forcing a bunch of localization stuff to the majority to accommodate the minority here.
I've already seen people ask for a spanish version of the latter, and there's huge spanish communities that'd rather hack in their own language, without having to resort to outdated forks.
This I believe is the greatest benefit of what your trying to do. Although, looking over the changes, majority does seem to be language changes. I still think it could be turned into a patch branch that replaces the english code/text with spanish code/text. Forget the building multiple versions in a single repo ect.. It would be much more solid to work from. Since it is mostly text changes, I still think rebasing would be fairly easy.
Now, with all that being said; I do have some suggestions if we decide to proceed with building multiple localizations in a single repo.
localization_de_*
and language_de_*
(Localization change might be something like: Jynx is color purple in the German localization) {This is a made up example}
(Language changes might be things like: Language text, textbox alignment changes, charmap changes, ect.)AzaleaMartBugCatcherScript:
jumptextfaceplayer AzaleaMartBugCatcherText
+if !DEF(_CRYSTAL_EU) ; or whatever we decide to filter out.
AzaleaMartCooltrainerMText:
text "There's no GREAT"
line "BALL here. #"
para "BALLS will have"
line "to do."
para "I wish KURT would"
line "make me some of"
cont "his custom BALLS."
done
AzaleaMartBugCatcherText:
text "A GREAT BALL is"
line "better for catch-"
cont "ing #MON than a"
cont "# BALL."
para "But KURT's might"
line "be better some-"
cont "times."
done
+endc
+ language_de_include maps/AzaleaMart.asm ; Macro conditionally includes version/de/maps/AzaleaMart.asm
+
AzaleaMart_MapEvents:
db 0, 0 ; filler
def_warp_events
warp_event 2, 7, AZALEA_TOWN, 3
warp_event 3, 7, AZALEA_TOWN, 3
def_coord_events
def_bg_events
def_object_events
object_event 1, 3, SPRITE_CLERK, SPRITEMOVEDATA_STANDING_RIGHT, 0, 0, -1, -1, 0, OBJECTTYPE_SCRIPT, 0, AzaleaMartClerkScript, -1
object_event 2, 5, SPRITE_COOLTRAINER_M, SPRITEMOVEDATA_STANDING_UP, 0, 0, -1, -1, 0, OBJECTTYPE_SCRIPT, 0, AzaleaMartCooltrainerMScript, -1
object_event 7, 2, SPRITE_BUG_CATCHER, SPRITEMOVEDATA_WALK_LEFT_RIGHT, 2, 0, -1, -1, PAL_NPC_RED, OBJECTTYPE_SCRIPT, 0, AzaleaMartBugCatcherScript, -1
+if !DEF(_CRYSTAL_EU)
db "FIGHT@"
db "<PKMN>@"
db "PACK@"
db "RUN@"
+endc
+ language_de_db "KMPF@" ;Macro will have a built in conditional
+ language_de_db "<PKMN>@"
+ language_de_db "BEUTEL@"
+ language_de_db "FLUCHT@"
+ language_es_db "LUCHA@"
+ language_es_db "<PKMN>@"
+ language_es_db "MOCHILA@"
+ language_es_db "ESC@"
ld hl, .BuenaComeAgainText
call PrintText
call JoyWaitAorB
+ localization_es_call PlayClickSFX ;Macro will have a built in conditional
ret
This makes it a bit easier for downstream users to remove localization specific code using a search and remove; or even entire languages.
You really underestimate how long it took to dump just those two languages... lol. I just don't expect it to be done for the other repos anytime soon. It should be possible to introduce a feature here that won't be backported. But again, moot point since the if defs wouldn't overlap in significant areas.
Yes, most people will be english-only hackers, which is why the i18n repo is built as it is. Most english hackers wouldn't notice the changes all that much, and the languages would be removable with the press of a delete key.
Building multiple languages in one repo is beneficial to enough people, it's just rarely done because of the upfront cost there's been all these years. Having the infrastructure in place would incentivize doing it, especially for smaller scale rom hacks. I don't think it's a feature worth overlooking. I've mentioned this before, but I would be using this, and I don't believe I'd be the only one.
Your desire to make it unintrusive to english-only hackers is unfortunately incompatible with your desire to not change the build system, as not doing so implies having conditional includes everywhere. I think treating the version/ subdirectories as overlays on top of the regular files is both fairly intuitive and keeps the main files cleaner. That said, I don't completely dislike your solution to map script duplication, and would be fairly compatible with the overlay system, though I believe this is a thing that can be discussed after everything is in place. Straightforward approach first, cleanups later.
Oh and I should mention there really aren't that many non-text localization changes. It's all related to metric vs imperial, with a couple of text engine changes. I was planning on having a _CRYSTAL_METRIC define.
Alright. @Rangi42 whenever you have time (I know you are busy), can you please weigh in on this discussion? This is a big enough change that it needs a general consensus.
I agree that pret's pokecrystal should be the main source for reproducing all Crystal ROMs. However, I think all the solutions for making master
reproduce every ROM have difficulties:
if DEF
checks and foreign text/data/codescan_includes
to handle language_*_include
macrosmake compare
Personally I like the sound of separate branches for each language: german
, crystal-de
, pokecrystal-de
, whatever you want to call them. Each could initially be done with two commits, one to remove the debug ROMs and one to replace the English ROM and VC builds with the translated text, graphics, hlcoords, etc, like https://github.com/Rangi42/pokecrystal/tree/no-maps. More commits could be added if necessary, changes to master
could be rebased or merged (I prefer rebasing since it avoids the clutter of merge commits), and GitHub makes it easy to compare the two branches. (Or locally you can checkout two copies and diff them.)
The main advantage to having all languages in one repo is being able to check matching locally with one make compare
command. However, I don't think we really need that: GitHub CI can compare for every language by just checking out and building each branch.
If we did one branch per language, I'd also be in favor of moving crystal-au
to such a branch for consistency.
Applying every commit fivefold is going to be a massive pain...
Would an acceptable middle ground be to have a branch that builds all the versions, while keeping master "clean"?
I really don't like the idea of multiple official branches since I think it's a slippery slope and I don't want every mildly controversial feature to end up as one but one is better than five imo, and it'd still give people the tools to support multiple languages in one repo.
How similar would the other branches be? Could be as simple as an action for each of them that rebases them every time master
changes, or on PRs, and in the (presumably rare) occasions the changes to master
were big enough to cause a conflict when rebasing, the PR author can go in and fix them.
E: As a separate point, the easier it is for other language communities to work on or with the codebase the better, I think. If that makes it a bit harder for English language communities to work with it because there's dirty foreign clutter, then... English speakers already have everything else pretty easy. I'm sure they can handle this one hardship to be a little more welcoming to the overseas communities.
A separate i18n
branch sounds fine to me. It would give us more freedom to experiment with just how the languages are implemented, and might even decide to merge it into master
.
Although i'm still not convinced that a single branch should make all the Roms.. I'm open to exploring what we can do in i18n in it's own branch. Perhaps once we all start working on it, we can find creative ways to make it work. It is a lot easier to come up with solutions when you have something viable to work with.
@mid-kid how would you like to proceed? I'm thinking we need to remove the -splitting changes from your i18n branch and bring it up to date with modern pokecrystal first. We should probably do this on one of our fork's and get it somewhat viable (buildable) before attempting to bring anything to pret. We can either set something up on my fork, or yours.
Edit: Oh, and I plan on helping.... so don't try to assume you are doing it yourself. I know you have limited, time.. so I think your time would be better spent on providing knowledge, review, and direction.
so if the i18n
branch is gonna exist, what do we do in terms of disassembling the rest of the languages and whatnot? Using directives like in pokeruby
and etc. isn't gonna quite cut it since this is ASM, I'm confused myself
It's one of the many reasons why a single internationalization build would be extremely cumbersome... I'd much rather see a branch per language.
No need to rehash the above discussion until/unless anyone has new i18n work already.
With a system like pokeruby has to keep different languages' data separate, so code files don't get cluttered with
if
/else
s.So far the only separate builds are USA/Europe Revision 1.1 and Australian (which just censors the Game Corner text from 1.1), but others are possible:
889a06fc0bb863666865aa69def0adf97945ac2a *pokecrystal-es.gbc accb584293ba056152f1fd908439b019017ff2fe *pokecrystal-de.gbc c055992b16b7399c687647725cdd1f4f13a2f75c *pokecrystal-fr.gbc 6cee05e5b95beeae74b8365ad18ec4a07a8c4af8 *pokecrystal-it.gbc
There's also
95127b901bbce2407daf43cce9f45d4c27ef635d *pokecrystal-jp.gbc
, but that probably has enough different code to justify a separate project, like pokegold.
And let's not forget about pokegold-ko
as G/S has Korean releases, might be worth opening a companion issue for that one there as well
Edit: pokegold
companion issue at https://github.com/pret/pokegold/issues/94
Wouldn't it be better to use a separate layout.link
for each language edition so things that are in different banks across versions will be laid out correctly and match in the different languages?
Ergo layout-es.link
and etc.
pokecrystal-i18n does exactly that.
pokecrystal-i18n does exactly that.
Can't seem to find the branch/repo, could be a private one for now though so that might be why
https://github.com/mid-kid/pokecrystal/tree/i18n. The linker script files are under version/crystal-xx/layout.link.
re: pokecrystal-jp
Would it be best if a pokecrystal-jp repo was created for disassembling the Japanese version because of how many differences there are between it and the rest of the versions? It makes sense because the differences are mainly in things like the mobile code and may in turn assist with documenting the rest of the mobile functions here in the main pokecrystal disassembly
Yes, -jp isn't included in this issue as per the opening message. Feel free to create a separate -jp disassembly any time.
With a system like pokeruby has to keep different languages' data separate, so code files don't get cluttered with
if
/else
s.So far the only separate builds are USA/Europe Revision 1.1 and Australian (which just censors the Game Corner text from 1.1), but others are possible:
There's also
95127b901bbce2407daf43cce9f45d4c27ef635d *pokecrystal-jp.gbc
, but that probably has enough different code to justify a separate project, like pokegold.