More user friendly way of translating text

pmrowla commented 4 years ago

Working off of extracted .lns scripts is better than nothing, but given that

most people using pylivemaker probably don't actually care about any of the non-text (event/cg/sound/etc) stuff that gets extracted
scripts and lines are extracted in the order they are stored in the lsb, which is not necessarily the order they actually show up in game, which makes figuring out context/scenario flow confusing in some cases
extracted scripts don't contain any useful information about the current message box size (such as height in lines, which is important given that speaker name usually needs to be manually added for the first line in each message box for a lot of LM games)
you can end up with a large # of .lns files for a single .lsb, which makes editing and re-inserting them a giant hassle

it would be ideal to have a better way of extracting text lines for one lsb (into a single file), translating them, and automatically patching those translations back into an lsb.

I'm currently thinking it would be nice to have something that looks similar to ren'py's translation units, where each "translation unit" would just be text up to a page break (<PG>). This would give the translator a good idea of what was originally supposed to fit into one message box screen (or was originally safe to rely on the game to scroll automatically).

Certain system event info (like NAMELABEL) probably still needs to be included as well, not sure about any other system events though.

pmrowla commented 4 years ago

So basically for games like AGLS where text lines are not manually broken up (and speaker info is handled via NAMELABEL as an image in the message box instead of a text line), you would get something like:

(syntax borrowed from renpy here, but probably not exactly what pylm syntax would look like)

# 00000623.lsb.pylm.translation
...
# NAMELABEL: 名前_リンネ
translate 070f_603912e6:
    # "「ああ、これがフォン・ブラウン………憧れの、フォン・ブラウンに、ついに………」<PG>"
    e '"Ah, this is Von Braun... the Von Braun I've longed for, finally..."'

where 070f is the TpWord block from 0623.lsb, and 603912e6 is the first 4 bytes of the sha256 hash for the original JP. Everything after the e is what the translator would add themselves

For games that manually do line breaks, you'd get something like:

# 00001.lsb
translate 0009_a18577af:
    # 【　教師　】<BR>
    # え～、最近の研究では、次元断裂の際に現れる敵性生物たちは<BR>
    # 一緒に転移してくる遺跡などから異世界の文明によって作られた<BR>
    # 兵器である、と考えられていて……<PG>
    e "[Sensei]"
    e "<translation line 1>"
    e "<translation line 2>"
    e "<translation line 3>"
    e "[Sensei]"
    e "<translation line 4>……"

This hypothetical also shows how we would handle the case where the msg box height is 4 lines including the speaker name, and the translated version would be more than 4 lines long.

We also assume that our tool for patching the translations back into an lsb will be smart enough to use <PG> at the end of the final e "..." line for a translation, and <BR> at the end of all the other ones, so that translators no longer have to add in the line breaks themselves. (Although for the 2nd case it may actually be required for the translator to manually add the hard page break between lines 3 and 4 to get the desired scroll/pagination behavior)

pmrowla commented 4 years ago

This is something that will probably take a long time to actually implement (and may never actually happen), but given that people actually seem to be using pylm it would be a real nice-to-have. If anyone else has thoughts on this feel free to discuss in this thread. Since I'm not a translator, I have no idea what people actually think of renpy's translation method, but as a dev their method seems to make sense to me.

Stefan311 commented 4 years ago

I agree that there should be a more user-friendly way of translating. The file format you are showing has a few advantages over the .lns format. The confusing non-text stuff is eliminated, and you can see the original and the translation at the same time. You can roughly see how the text block will be formatted, and you can still change lines and page breaks manually. This format is still not suitable for machine translation. And let's be honest: how many games are really worth it that someone who does a lot of work translates them by hand? Most of the translated games I know are machine translated. They are not really a pleasure to read, but at least you can see what's going on and what you have to do next. I would therefore suggest creating extract and insert functions similar to the menu functions. I would leave the text formatting as it is. Just write line by line for translation into a CSV file. If you don't mind, I would try to implement something like this next week. Do you even like that I work on your project? I do not want to impose myself.

pmrowla commented 4 years ago

I'm not a translator, so honestly I have no idea what kind of format people would want for this. I was mostly just brainstorming based on how renpy handles this issue.

akanetr commented 4 years ago

Maybe you guys can work together. translator++ They got GUI, while you handle command line.

pmrowla commented 4 years ago

it looks like they already support reading/writing to spreadsheet formats so it seems like we don't really need to do anything other than support csv as an extract/insert format (which is being added in #33 and #37).

it'd mostly be on whoever works on translator++ to add pylm to their project and run the appropriate commands for unpacking games, extracting scripts, and repacking everything

akanetr commented 4 years ago

Another example is tyrano translator. It's a simple GUI tool to extract, export, import and build tyrano builder games. For translators who are unfamiliar with command line, just a simple GUI tool will make things much easier.

pmrowla commented 4 years ago

Yeah, I'm not opposed to having a GUI for pylm, and I can understand the appeal from a translator's perspective, but I personally just don't have the time or interest to write one. But most of the API stuff in pylm was designed with the assumption that other people would write more UI stuff on top of it (whether its command line or GUI), and other projects are free to integrate pylm as long as their licensing is GPL compatible.

and with regard to licensing, it's not even really up to me since the original irl project is GPL and owned by tinfoil.

akanetr commented 4 years ago

Thank you. Your work and tinfoil's work are the most valuable for lm games' translators.

Stefan311 commented 4 years ago

I also know Translator++ (I am one of its patreons). Even if I dislike the GUI style of this project (does not work in linux, ugly render glitches in Windows VM), I respect the effort of making a universal simple-to-use translator. Translator++ bundles and uses many other open source game extractors and patchers. They even ship with ruby, php and python34. Maybe you should consider starting communication with them.

pmrowla commented 4 years ago

As far as I can tell, they have no listed contact information on their site other than their patreon (which I do not have access to), so it would probably be easiest if communication started the other way around?

Stefan311 commented 4 years ago

If you like, i can ask on patreon if they are interested in co-operative work. Do you?

pmrowla commented 4 years ago

Sure. I have no problem w/them integrating this as long as their stuff is open source w/GPL compatible license (which seems to be the case, at least for their last available source distribution). Since (according to their docs) they support spreadsheets, all they should really need to do is call the batch export/insert/patch scripts and use the new csv formats for translation. But they will have to distribute a newer version of Python (3.7+ preferred). Python 3.4 was end-of-lifed over a year ago.

pmrowla commented 4 years ago

exporting and importing directly to their .trans json format should also be possible now that we have text block API, but they would probably have to extend their json format to have some new metadata field for storing pylm identifiers. Since patching things in an lsb requires context specific data about what you are actually trying to patch (menu location, textins location, text block index, etc)

since it says they want to add support for things like renpy, i'm assuming they already have plans for something along those lines, since for renpy I think they would need to be storing renpy translation block hashes.

pmrowla commented 4 years ago

future translation++ discussion can go in #58 (someone else posted their scripts for using it in there)

pmrowla / pylivemaker

More user friendly way of translating text #20