translation issues - Githubissues

Hogarth-MMD commented 7 years ago

Translation hits a snag with 2 issues:

half-width katakana and full-width katakana. In the models which I am trying to translate, there are both half-width katakana and full-width katakana. I am not really worried about this issue. I feel sure that this problem can be solved without too much difficulty.
There is a much more serious and problematical issue with translation. This issue is the issue of overlapped tokens. The same character or the same sequence of characters can exist in more than one token. After I added colors and clothes to my WIP translation dictionary, this problem has become a major problem. If token replacement does not happen in EXACTLY the correct order, the result is an incomplete translation. At this time, I am completely puzzled and groping in the dark about what algorithm is needed to solve this problem.

nathanvasil commented 7 years ago

Can't you just sort the dictionary from greatest to fewest characters? It seems to me that would take care of any issues. That way, it would always prefer longer, more carefully translated tokens to shorter sub-tokens.

Hogarth-MMD commented 7 years ago

@nathanvasil I am not sure whether or not that would work. To test that idea, we would need to have a python script which tests a dictionary for possible problems. I am still groping in the dark about what the algorithm would need to be for a dictionary-testing script(?).

Hogarth-MMD commented 7 years ago

If mmd_tools sorts a dictionary from longest tokens to shortest tokens, that might be a simple solution which prevents many translation errors. This is a very useful idea, thank you. What about a case where the end of one token overlaps with the beginning of another token? I'm not sure if that problem could be solved.

nathanvasil commented 7 years ago

I think that would be solved by sorting. Think it through:

abcd defg cdef cde bc a d

If hit hits abcd or defg first, it won't follow through to examine further tokens, right? Appropriate.

If it encounters bcde, it will translate cde, then lacking a translation for b, leave that untranslated (rather than translating bcd and leaving e untranslated.) Appropriate.

If it encounters abc, it will translate it as a-bc. Appropriate.

The only case I can imagine is when you have an ambiguous definition, as in definitions of only ab, bc, and cd and you're translating abcd. That's not a problem with the algorithm, it's just an ambiguous definition, no algorithm is going to give a correct answer because there is no correct answer.

Now, if you wanted to be rigorous about this, you probably wouldn't bother making a testing script; after all, your script can have problems too, and it will only test the foreseen. You would prove this, like in math. If that's not something you're comfortable with, as I am not, then just look at a bunch of cases and if you fail to see a problem, roll the dice.

powroupi commented 7 years ago

Yea, mmd_tools sorts a dictionary from longest (JP) tokens to shortest (JP) tokens by default. And when you encounter an issue of ambiguous definition, you might need to add some longer tokens to solve it. The algorithm is simple, it just loops through the dictionary and replaces the tokens if found.

If your dictionary is ab,bc,a,b,c,d, abcd will be translated as ab-c-d. If you want to translate abcd as a-bc-d, you need to add abc to the dictionary so that abc will be translated before ab. And you can also add abcd to the dictionary to translate full names. :smile:

powroupi commented 7 years ago

You can use blender python console to see/get the (sorted) dictionary as well. :smile:

Drag & drop translations.csv to blender text editor

Go to blender python console window, enter following 3 commands:

>>> from mmd_tools import translations
>>> t = translations.getTranslator(bpy.data.texts['translations.csv'])
>>> t.save_to_stream(bpy.data.texts.new('sorted.csv'))

Go to blender text editor to see sorted.csv

Hogarth-MMD commented 7 years ago

https://pypi.python.org/pypi/jaconv/

jaconv python package has the python code which is needed to convert between half-width and full-width katakana. (jaconv = japanese convert).

Hogarth-MMD commented 7 years ago

https://sta.sh/0dcgoq1of6u

Here is the code to convert half-width katakana to full-width katakana, using the list of tuples format.

Hogarth-MMD commented 7 years ago

Hello @powroupi ! In translations, py there is a list of tuples called jp_to_en_tuples. This list of tuples is never sorted according to the length of a token, is it?

The csv file format is causing a big problem for me. My csv dictionary causes freezes and crashes and a memory error with mmd_tools. My csv file is originally a .py file with a list of tuples, but it has many comments in it which are needed for organization and for clarifying the definitions of words. Possibly these comments are causing a problem and trying to convert my .py dictionary to .csv is a big waste of time for me.

powroupi commented 7 years ago

there is a list of tuples called jp_to_en_tuples. This list of tuples is never sorted according to the length of a token, is it?

Yes, it is fixed order, never sorted.

The csv file format is causing a big problem for me. My csv dictionary causes freezes and crashes and a memory error with mmd_tools...

Please give me the csv file, so I can check if there is any bug in my code (maybe you didn't follow the csv format). Also, if you give me your .py dictionary, I may be able to convert it. Thank you. :smile:

Hogarth-MMD commented 7 years ago

MMD translations dictionary 1.1 is available for download (863 tokens) 😃 https://sta.sh/01zwuicd6osn

Hogarth-MMD commented 7 years ago

My translation dictionary is also intended to translate model names and model comments. I recommend that translation of model names and model comments be added into MMD tools.

Hogarth-MMD commented 7 years ago

I recommend that this feature should be added into mmdtools: A prefix should be added to the beginning of the English name of each advanced morph type when adding English names. This makes it easier to see which type of morph it is in MMD or MMM: B bone morph M material morph G group morph UV_ uv morph

Hogarth-MMD commented 7 years ago

Okay, I have written some instructions for using the mmd_tools translate tool and for using my translation dictionary. See my next comment.

Hogarth-MMD commented 7 years ago

@1. Download the mmd_tools add-on (powroupi fork) from here:

mmd_tools direct download https://github.com/powroupi/blender_mmd_tools/archive/dev_test.zip

mmd_tools main page https://github.com/powroupi/blender_mmd_tools

mmd_tools issues/bug reports https://github.com/powroupi/blender_mmd_tools/issues

In the unzipped folder is a folder called mmd_tools. Copy the mmd_tools folder into Blender's scripts/add-ons folder. After opening Blender, click File, User Preferences. In User Preferences, Add-ons, Object, find the Object:mmd_tools add_on and click to put a checkmark in the little box beside it and to enable it. Beside where it says "Shared Toon Texture Folder", navigate to, and select, the Data directory of MikuMikuDance, so that the standard MMD toon textures will be used with your imported MMD models. Click "Save User Preferences" before exiting out of User Preferences.

3.Download the MMD translations dictionary file from here: https://sta.sh/01zwuicd6osn

After unzipping this file (with 7-zip or Bandizip, for example), Copy the translations.csv file into the mmd_tools folder, into the same directory where a file named translations.py is already located. Do not rename the translations.csv file. It must have exactly the same name as the translations.py file, except for the .csv extension. (In the MMD translations dictionary is also a translations.py file. Do nothing with this file unless you really know what you are doing with Python and mmd_tools. This file is only provided for the benefit of the mmd_tools developers.)
Import a .pmd or .pmx model into Blender. Click File, Import, MikuMikuDance model (.pmd .pmx).
In the mmd_tool panel (whose tab is on the left side of the 3D view), click on the Translate button. This causes a popup dialog box to appear. Ignore the Dictionary dropdown list, unless you have loaded your own translation dictionary into Blender's text editor. Click OK. English names will then be mass added onto your imported MMD model(s) within several seconds or within a fraction of a second.
You can then export this model from Blender with mmd_tools with all of its new English names of morphs, bones, display panel groups, physics, and materials. Click File, Export, MikuMikuDance model (.pmx) to export the model.

@

Hogarth-MMD commented 7 years ago

Here is the python code for model name and for reading the model comment, but I don't know the code for editing a model comment. For the code below to work, an mmd_root empty object must be the active object.

bpy.context.active_object.mmd_root.name

bpy.context.active_object.mmd_root.name_e

bpy.data.texts[bpy.context.active_object.mmd_root.comment_text].as_string()

bpy.data.texts[bpy.context.active_object.mmd_root.comment_e_text].as_string()

Hogarth-MMD commented 7 years ago

I recommend that Rename Bones L/R Suffix should be disabled by default . This option interferes with any script that mass renames bones to Japanese MMD bone names, and it caused me to waste about 2 hours of time awhile ago. It interferes with my script that renames the DAZ CMU BVH bones, for example: http://sta.sh/01acutxl5g0i

Hogarth-MMD commented 7 years ago

Hello @powoupi! I cannot make a translations.csv file which converts half-width Katakana to full-width Katakana, because this needs to happen at the beginning of token replacement. mmd_tools will sort this list and re-locate the half-width to full-width tuples to the bottom of the list.

powroupi commented 7 years ago

@Hogarth-MMD I think a way is to use fixed order dictionary, but we will need to adjust the order manually which is another pain. :cry: Another way is to add a function for half-width Katakana to full-width Katakana, and we can add the table to translations.py since the table does not ought to be changed, this way might be better I think. :smile:

powroupi commented 7 years ago

Okay, I've updated translation tool, added options for full-width Katakana and morph prefix as @Hogarth-MMD's requests. :smile:

Additional note, the string in csv should be double quoted or non-quoted, I'm not sure if there is any difference between operation systems. @Hogarth-MMD's csv works fine if I remove single quote characters. :)

Hogarth-MMD commented 7 years ago

@Hello @powroupi! Thank you for your generous giving of your time. There should be NO prefix added onto vertex morphs. English names of vertex morphs should not be changed. The prefix for UV morphs should be UV_ .

Hogarth-MMD commented 7 years ago

There should not be any "Full-width Katakana" option. This should be completely invisible to the user. The user should not need to know anything about half-width or full-width Katakana. Both the translation dictionary and the Japanese names of models should be automatically converted to full-width Katakana before translation to English.

powroupi commented 7 years ago

There should not be any "Full-width Katakana" option. This should be completely invisible to the user....

Yes, I agree that. I added that option just in case for debugging, and users will know about half-width or full-width Katakana because there are already many models using half-width Katakana.

Both the translation dictionary and the Japanese names of models should be automatically converted to full-width Katakana before translation to English.

No, we just need to convert the Japanese names of models before translation to English, so you will only need to define full-width Katakana tokens in the translation dictionary, and you can grab those tokens form 'mmd_tools.translations.fails' in blender text editor. :)

powroupi commented 7 years ago

Okay, the prefix is updated and "Full-width Katakana" option is hidden now (only available in python console). :)

Hogarth-MMD commented 7 years ago

Hello @powroupi, I see the new Information button. Have you added translations of model names or model comments yet? Or only the button?

Hogarth-MMD commented 7 years ago

"Additional note, the string in csv should be double quoted or non-quoted, I'm not sure if there is any difference between operation systems. @Hogarth-MMD's csv works fine if I remove single quote characters."

Hello @powroupi , I don't understand this issue. Why are single quotes a problem? Is there an operating system which has a problem with single quotes? Can you please explain this?

powroupi commented 7 years ago

I see the new Information button.

It is a option, shift+LMB click to add options, users can choose which parts will be translated, info part is included (but I think it is better to use true translator to translate comments), translate method is the same.

Why are single quotes a problem?

Python csv module use double quotes by default, and I didn't change the behavior, and I can't use your translations.csv unless I remove single quotes or replace them to double quotes. Does your translations.csv works on your system?

Hogarth-MMD commented 7 years ago

MMD translations dictionary 1.1 download

https://sta.sh/01zwuicd6osn

Now it has double quotes, all full-width, and no duplicates (863 tokens)

Hogarth-MMD commented 7 years ago

Actually there is still one duplicate 紫 (can be translated as Yukari or Purple.) I don't know any solution to this translation issue.

Hogarth-MMD commented 7 years ago

Hello @powroupi I am too exhausted from all of this hard work with translation to give you a good thank you. I need 20 hours of sleep followed by intensive psychotherapy from a psychiatrist who specializes in Japanese language stress. 😩 I just downloaded Git.

powroupi commented 7 years ago

Thanks for your hard work. :smile: Take it easy, there is no time limit, we can do this slowly since there might not be a end point of this task. :worried:

紫 (can be translated as Yukari or Purple.)

It is a limitation of our translation method. We can choose the one which is usually used. :)

Hogarth-MMD commented 7 years ago

Hi @powroupi , I have been dealing with computer problems for the last 10 days. This is my first time back on Github since 10 days ago. Using computers and internet demands so much time and perseverance! So much time and perseverance to troubleshoot computer and internet issues! I am thinking that probably most people just get frustrated and give up. :-( This makes me sad, because people are deprived of the wonderful advantages that the internet could bring to them. :-(

powroupi commented 7 years ago

Welcome back, @Hogarth-MMD. :smile: Internet is wonderful, but don't forget to take a rest and do some sports. :yum:

Hogarth-MMD commented 7 years ago

Hello @powroupi ! How many downloads have there been of your version of mmd_tools? I tried to find this information, but I don't see it anywhere.

powroupi commented 7 years ago

Sorry, I don't update the version frequently, just see the date of latest commit to see if there is an update. :smile:

Hogarth-MMD commented 7 years ago

Hello @powroupi ! You didn't understand my question. Other people have downloaded the mmd_tools master.zip file. The users of mmd_tools have downloaded this file. How many times have other people downloaded this file? This information tells you approximately how popular mmd_tools is and approximately how many people are using mmd_tools. (the powroupi version of mmd_tools, I mean)

powroupi commented 7 years ago

Well, I don't know how many people download my dev_test branch (the master branch is the same as @sugiany's master branch). GitHub only provide basic informations, so I can only know there are about 200 unique visitors in last 2 weeks, and 10 ~ 30 unique visitors per day. (at beginning, there are only 0 ~ 3 unique visitors per day) Actually, I don't really care about that, at least I enjoy using/improving this tool. :yum:

Hogarth-MMD commented 7 years ago

Okay, here it is. My very first repository created completely from scratch. I would be thankful if @powroupi or whoever else has time would take a look at it and tell me if I have made any idiot mistakes with it. https://github.com/Hogarth-MMD/mmd_tools_translation

Hogarth-MMD commented 7 years ago

Hello @powroupi ! What do you think? Has the time arrived for you to remove the "!" from the mmd_tools translate button? :smile:

powroupi commented 7 years ago

Yeah, I think major function is there, and currently I don't have any good idea to improve it. :smile:

powroupi / blender_mmd_tools

translation issues #73