Open Hogarth-MMD opened 7 years ago
Can't you just sort the dictionary from greatest to fewest characters? It seems to me that would take care of any issues. That way, it would always prefer longer, more carefully translated tokens to shorter sub-tokens.
@nathanvasil I am not sure whether or not that would work. To test that idea, we would need to have a python script which tests a dictionary for possible problems. I am still groping in the dark about what the algorithm would need to be for a dictionary-testing script(?).
If mmd_tools sorts a dictionary from longest tokens to shortest tokens, that might be a simple solution which prevents many translation errors. This is a very useful idea, thank you. What about a case where the end of one token overlaps with the beginning of another token? I'm not sure if that problem could be solved.
I think that would be solved by sorting. Think it through:
abcd defg cdef cde bc a d
If hit hits abcd or defg first, it won't follow through to examine further tokens, right? Appropriate.
If it encounters bcde, it will translate cde, then lacking a translation for b, leave that untranslated (rather than translating bcd and leaving e untranslated.) Appropriate.
If it encounters abc, it will translate it as a-bc. Appropriate.
The only case I can imagine is when you have an ambiguous definition, as in definitions of only ab, bc, and cd and you're translating abcd. That's not a problem with the algorithm, it's just an ambiguous definition, no algorithm is going to give a correct answer because there is no correct answer.
Now, if you wanted to be rigorous about this, you probably wouldn't bother making a testing script; after all, your script can have problems too, and it will only test the foreseen. You would prove this, like in math. If that's not something you're comfortable with, as I am not, then just look at a bunch of cases and if you fail to see a problem, roll the dice.
Yea, mmd_tools sorts a dictionary from longest (JP) tokens to shortest (JP) tokens by default. And when you encounter an issue of ambiguous definition, you might need to add some longer tokens to solve it. The algorithm is simple, it just loops through the dictionary and replaces the tokens if found.
If your dictionary is ab,bc,a,b,c,d
, abcd
will be translated as ab-c-d
. If you want to translate abcd
as a-bc-d
, you need to add abc
to the dictionary so that abc
will be translated before ab
. And you can also add abcd
to the dictionary to translate full names. :smile:
You can use blender python console to see/get the (sorted) dictionary as well. :smile:
translations.csv
to blender text editor>>> from mmd_tools import translations
>>> t = translations.getTranslator(bpy.data.texts['translations.csv'])
>>> t.save_to_stream(bpy.data.texts.new('sorted.csv'))
sorted.csv
https://pypi.python.org/pypi/jaconv/
jaconv python package has the python code which is needed to convert between half-width and full-width katakana. (jaconv = japanese convert).
Here is the code to convert half-width katakana to full-width katakana, using the list of tuples format.
Hello @powroupi ! In translations, py there is a list of tuples called jp_to_en_tuples. This list of tuples is never sorted according to the length of a token, is it?
The csv file format is causing a big problem for me. My csv dictionary causes freezes and crashes and a memory error with mmd_tools. My csv file is originally a .py file with a list of tuples, but it has many comments in it which are needed for organization and for clarifying the definitions of words. Possibly these comments are causing a problem and trying to convert my .py dictionary to .csv is a big waste of time for me.
there is a list of tuples called jp_to_en_tuples. This list of tuples is never sorted according to the length of a token, is it?
Yes, it is fixed order, never sorted.
The csv file format is causing a big problem for me. My csv dictionary causes freezes and crashes and a memory error with mmd_tools...
Please give me the csv file, so I can check if there is any bug in my code (maybe you didn't follow the csv format). Also, if you give me your .py dictionary, I may be able to convert it. Thank you. :smile:
MMD translations dictionary 1.1 is available for download (863 tokens) 😃 https://sta.sh/01zwuicd6osn
My translation dictionary is also intended to translate model names and model comments. I recommend that translation of model names and model comments be added into MMD tools.
I recommend that this feature should be added into mmdtools: A prefix should be added to the beginning of the English name of each advanced morph type when adding English names. This makes it easier to see which type of morph it is in MMD or MMM: B bone morph M material morph G group morph UV_ uv morph
Okay, I have written some instructions for using the mmd_tools translate tool and for using my translation dictionary. See my next comment.
@1. Download the mmd_tools add-on (powroupi fork) from here:
mmd_tools direct download https://github.com/powroupi/blender_mmd_tools/archive/dev_test.zip
mmd_tools main page https://github.com/powroupi/blender_mmd_tools
mmd_tools issues/bug reports https://github.com/powroupi/blender_mmd_tools/issues
3.Download the MMD translations dictionary file from here: https://sta.sh/01zwuicd6osn
After unzipping this file (with 7-zip or Bandizip, for example), Copy the translations.csv file into the mmd_tools folder, into the same directory where a file named translations.py is already located. Do not rename the translations.csv file. It must have exactly the same name as the translations.py file, except for the .csv extension. (In the MMD translations dictionary is also a translations.py file. Do nothing with this file unless you really know what you are doing with Python and mmd_tools. This file is only provided for the benefit of the mmd_tools developers.)
Import a .pmd or .pmx model into Blender. Click File, Import, MikuMikuDance model (.pmd .pmx).
In the mmd_tool panel (whose tab is on the left side of the 3D view), click on the Translate button. This causes a popup dialog box to appear. Ignore the Dictionary dropdown list, unless you have loaded your own translation dictionary into Blender's text editor. Click OK. English names will then be mass added onto your imported MMD model(s) within several seconds or within a fraction of a second.
You can then export this model from Blender with mmd_tools with all of its new English names of morphs, bones, display panel groups, physics, and materials. Click File, Export, MikuMikuDance model (.pmx) to export the model.
@
Here is the python code for model name and for reading the model comment, but I don't know the code for editing a model comment. For the code below to work, an mmd_root empty object must be the active object.
bpy.context.active_object.mmd_root.name
bpy.context.active_object.mmd_root.name_e
bpy.data.texts[bpy.context.active_object.mmd_root.comment_text].as_string()
bpy.data.texts[bpy.context.active_object.mmd_root.comment_e_text].as_string()
I recommend that Rename Bones L/R Suffix should be disabled by default . This option interferes with any script that mass renames bones to Japanese MMD bone names, and it caused me to waste about 2 hours of time awhile ago. It interferes with my script that renames the DAZ CMU BVH bones, for example: http://sta.sh/01acutxl5g0i
Hello @powoupi! I cannot make a translations.csv file which converts half-width Katakana to full-width Katakana, because this needs to happen at the beginning of token replacement. mmd_tools will sort this list and re-locate the half-width to full-width tuples to the bottom of the list.
@Hogarth-MMD I think a way is to use fixed order dictionary, but we will need to adjust the order manually which is another pain. :cry: Another way is to add a function for half-width Katakana to full-width Katakana, and we can add the table to translations.py
since the table does not ought to be changed, this way might be better I think. :smile:
Okay, I've updated translation tool, added options for full-width Katakana and morph prefix as @Hogarth-MMD's requests. :smile:
Additional note, the string in csv should be double quoted or non-quoted, I'm not sure if there is any difference between operation systems. @Hogarth-MMD's csv works fine if I remove single quote characters. :)
@Hello @powroupi! Thank you for your generous giving of your time. There should be NO prefix added onto vertex morphs. English names of vertex morphs should not be changed. The prefix for UV morphs should be UV_ .
There should not be any "Full-width Katakana" option. This should be completely invisible to the user. The user should not need to know anything about half-width or full-width Katakana. Both the translation dictionary and the Japanese names of models should be automatically converted to full-width Katakana before translation to English.
There should not be any "Full-width Katakana" option. This should be completely invisible to the user....
Yes, I agree that. I added that option just in case for debugging, and users will know about half-width or full-width Katakana because there are already many models using half-width Katakana.
Both the translation dictionary and the Japanese names of models should be automatically converted to full-width Katakana before translation to English.
No, we just need to convert the Japanese names of models before translation to English, so you will only need to define full-width Katakana tokens in the translation dictionary, and you can grab those tokens form 'mmd_tools.translations.fails'
in blender text editor. :)
Okay, the prefix is updated and "Full-width Katakana" option is hidden now (only available in python console). :)
Hello @powroupi, I see the new Information button. Have you added translations of model names or model comments yet? Or only the button?
"Additional note, the string in csv should be double quoted or non-quoted, I'm not sure if there is any difference between operation systems. @Hogarth-MMD's csv works fine if I remove single quote characters."
Hello @powroupi , I don't understand this issue. Why are single quotes a problem? Is there an operating system which has a problem with single quotes? Can you please explain this?
I see the new Information button.
It is a option, shift+LMB click to add options, users can choose which parts will be translated, info part is included (but I think it is better to use true translator to translate comments), translate method is the same.
Why are single quotes a problem?
Python csv module use double quotes by default, and I didn't change the behavior, and I can't use your translations.csv unless I remove single quotes or replace them to double quotes. Does your translations.csv works on your system?
MMD translations dictionary 1.1 download
Now it has double quotes, all full-width, and no duplicates (863 tokens)
Actually there is still one duplicate ç´« (can be translated as Yukari or Purple.) I don't know any solution to this translation issue.
Hello @powroupi I am too exhausted from all of this hard work with translation to give you a good thank you. I need 20 hours of sleep followed by intensive psychotherapy from a psychiatrist who specializes in Japanese language stress. 😩 I just downloaded Git.
Thanks for your hard work. :smile: Take it easy, there is no time limit, we can do this slowly since there might not be a end point of this task. :worried:
ç´« (can be translated as Yukari or Purple.)
It is a limitation of our translation method. We can choose the one which is usually used. :)
Hi @powroupi , I have been dealing with computer problems for the last 10 days. This is my first time back on Github since 10 days ago. Using computers and internet demands so much time and perseverance! So much time and perseverance to troubleshoot computer and internet issues! I am thinking that probably most people just get frustrated and give up. :-( This makes me sad, because people are deprived of the wonderful advantages that the internet could bring to them. :-(
Welcome back, @Hogarth-MMD. :smile: Internet is wonderful, but don't forget to take a rest and do some sports. :yum:
Hello @powroupi ! How many downloads have there been of your version of mmd_tools? I tried to find this information, but I don't see it anywhere.
Sorry, I don't update the version frequently, just see the date of latest commit to see if there is an update. :smile:
Hello @powroupi ! You didn't understand my question. Other people have downloaded the mmd_tools master.zip file. The users of mmd_tools have downloaded this file. How many times have other people downloaded this file? This information tells you approximately how popular mmd_tools is and approximately how many people are using mmd_tools. (the powroupi version of mmd_tools, I mean)
Well, I don't know how many people download my dev_test
branch (the master
branch is the same as @sugiany's master
branch). GitHub only provide basic informations, so I can only know there are about 200 unique visitors in last 2 weeks, and 10 ~ 30 unique visitors per day. (at beginning, there are only 0 ~ 3 unique visitors per day) Actually, I don't really care about that, at least I enjoy using/improving this tool. :yum:
Okay, here it is. My very first repository created completely from scratch. I would be thankful if @powroupi or whoever else has time would take a look at it and tell me if I have made any idiot mistakes with it. https://github.com/Hogarth-MMD/mmd_tools_translation
Hello @powroupi ! What do you think? Has the time arrived for you to remove the "!" from the mmd_tools translate button? :smile:
Yeah, I think major function is there, and currently I don't have any good idea to improve it. :smile:
Translation hits a snag with 2 issues:
half-width katakana and full-width katakana. In the models which I am trying to translate, there are both half-width katakana and full-width katakana. I am not really worried about this issue. I feel sure that this problem can be solved without too much difficulty.
There is a much more serious and problematical issue with translation. This issue is the issue of overlapped tokens. The same character or the same sequence of characters can exist in more than one token. After I added colors and clothes to my WIP translation dictionary, this problem has become a major problem. If token replacement does not happen in EXACTLY the correct order, the result is an incomplete translation. At this time, I am completely puzzled and groping in the dark about what algorithm is needed to solve this problem.