z1dev / zkanji

Japanese language study suite and dictionary
GNU General Public License v3.0
61 stars 10 forks source link

Future feature requests #10

Open am2del opened 6 years ago

am2del commented 6 years ago

A brief overview - INDEX:

--> Future feature request #1 - Flags. --> Future feature request #2 - Annotation descriptions. --> Future feature request #3 - Font size. --> Future feature request #4 - Examples, names & multi-lingual. --> Future feature request #5 - Regex.

-- Future feature request #1 --

In terminal, and plausibly at Help-menu: --> Flag/switch information, e.g.: --help or -h brings up what options there are and short info.

-- Future feature request #2 --

Annotation descriptions @ Help-menu or Dictionary-menu. These can be extracted from JMdict: --> Format: <!ENTITY term "Explanation"> --> Inbetween:

<!-- The following entity codes are used for common elements within the
...
<!-- JMdict created:  

-- Future feature request #3 --

Font size adjustable "by px" or a "Very Large", as so-called "Large" is fairly small - even using a 27" monitor @ 1920x1080. Indeed, there's interface scaling - great, but for those who only wish to increase size of dictionary entries a "Very Large" or freely adjustable "by px" would be perfect.

-- Future feature request #4 --

Browsable & searchable examples dictionary. Browsable & searchable names dictionary. (Possible to import along with the others? Using like: zkanji -ien PATH) Build multi-dictionaries (one for each language in JMdict) at initial import (zkanji -ie PATH)

-- Future feature request #5 --

Simple regex (or import existing regex library/module parhaps?) @ search, e.g.: --> "*" (none-or-more-characters), --> "." (single-character), --> "|" (this-or-that, plausibly brackets enclosure, "[" and "]", for multichar-this-or-that) --> "&" (this-and-that, plausibly brackets enclosure, "[" and "]", for multichar-this-and-that)

Example: --> ご*そうさま, --> ご.そうさま, --> ごち|くそうさま, --> ご[ちそう|くそう]さま ...would all find 御馳走様「ごちそうさま」, and anything else matching the given pattern.

Notes: --> "|" and "&" for latin letters could plausibly be used inbetween SPACE(s) instead of brackets. --> "&" is primarily useful in latin-style lookup and examples.

z1dev commented 6 years ago

(This is my third post in case you are reading them in different order.)

1 is simple enough. The program is not very command line friendly right now, but a simple help flag is understandable. Just in case, currently only -i or -e (-ie for combined) are accepted flags for importing the dictionary or examples, as I wrote in the install instructions.

2: The annotation descriptions from JMdict wouldn't be very useful, because I renamed them to something I found much more sensible. Many were joined too. I don't know if they were added like this for linguists or for algorithms to handle them better, but changing them was the only option to make them a bit user friendly. The program help is completely missing for now, but it's sensible to have at least the annotations listed somewhere.

3 Adjustable font sizes by pixel might be the best, but it's faster and cheaper to add the "very large" option. If someone needs an even bigger size after that, they probably need interface scaling as well.

4 I always wanted searchable examples, but in the end I'm not sure if that isn't an overkill. The ability to list the examples for a given word in a separate window, and allowing to see them all in a list might be more useful. Adding a name dictionary is among my plans too.

5 When I first designed zkanji I started out with a regexp search, but abandoned the idea after not finding much use for it. It's not difficult to make, since I keep a map of every single word for each separate kana character. I use that for the ?+text-in-middle+? searches. I need real life examples where this might be useful though.

am2del commented 6 years ago

For #1, aware of those being the only ones listed at the instructions - but still, appriciated you concider adding it so it's allways at hand. Thank you.

For #2, thank you for concidering adding it - it's a helpful feature which saves users quite some headache.

For #3, seems we're on same page - thank you a lot. Go with whatever's easy to implement.

For #4 & #5, this gets a little bit long and more personal than objective - please bear with me. Searchable examples would be a great feature for those learning Japanese - and especially early stages of learning; especially 小学 or pre-N2. This feature request is sort of connected with #5. As mentioned briefly in another post, some of my acquaintances are currently studying Japanese (still early stages) and I get a lot of questions as I leared Japanese on-site the Japanese way, e.g. no glossary or JLPT-style. Wordings, tips, tricks, tools etc. Looking such up I realized most of the tools are too basic and therefore not suited for other than near advanced or higher levels. Most study apps aren't even useful even for beginners - and many of them contains errors at even the most basic parts.

A good way to learn when to use what wording is per example. Being able to browse and search examples is very helpful - and I can speak for myself too as a fairly advanced foreign speaker who didn't learn Japanese through glossary - those examples are very useful. But the issue is finding the right examples as there are too many. WWWJDIC has a feature for this, but very basic without regex and it's in the smartphone app as well. Can get hundreds of examples where none is what one was looking for due to the non-regex style. (Side note: Personally, for the time being, I open a terminal and run grep -iE PATTERN on the raw examples-file and set up an alias just the other day for acquaintances who use Linux.)

This is where the #4 and #5 connects. Having this feature on a cross-platform desktop dictionary which can be used offline would be a true killer feature for those learning Japanese. Would greatly improve their chances of finding how to express themselves properly - as well as it's great for Japanese natives who study other English.

Mentioned in post above, 御馳走様「ごちそうさま」, came up while dining togather the other day - as I learned my Japanese in 関西 (Kansai), there's a bit of an issue with me adapting their infamous accent... so, when they heard me they intrepeted it as 「ごくそうさま」 where the 「う」 in 「く」 had been dropped. So they opened zKanji on the computer to look it up and got really frustrated as they couldn't find it. Words ending with 「そ」 or 「そう」 are plenty... and they also used the English part to try find but you get a lot when you type "meal", and as they (unlike me) aren't native English speakers (although very good at English) they typed things like "Thank you for the food" etc. but not "What a wonderful meal"/"That was a delicious meal"/"said after meals" which ment they got irritated and eventaully gave up. This is what got me going on simple regex, just "*" and "." would be great for the "Japanese to English" and "Browse Japanese" modes - if had been available they had found it in notime. As for examples and "English to Japanese" mode, being able to type keys and options like "meal&[thank|great|delicious|after]" they had found it in a flash. It's also a great way to narrow down to usefull examples. I, who's lazy and wanted to demostrate this very example to me acquaintances, opened a terminal and ran "grep -i meal | grep -iE '(thank|great|delicious|after)'" on the JMdic file and got the, less than 10, potential English translations instantly. For them to look up and check out in zKanji.

To actually look things up oneself is a great way of learning, as that effort triggers the brain to actually remember - whereas asking and just getting an answer may be forgotten in a flash.

This also connects to the final item at #4 - as the source support other languages as well it'd be very neat to be able to import these as well upon running the -ie flag.

Another occation when I personally would like a regex feature is when translating texts, to find appropriate synonyms and expressions without doing 10+ individual searches for a single purpose. It's also great way to narrow down when unsure about spelling or wording.

am2del commented 6 years ago

Would also like to add an additional request, which could ease up learning for people as it's overlooked - yet super useful. NOTE! Adding entierly new features like this should be after the already planned features are implemented and working.

-- Future feature request #6 --

Radical dictionary/browser, new and old system. E.g. along the lines of: Radical | NumOld | NumNew | Strokes | Reading(s) | Meaning(s) Radicals makes up Kanji. Know the radicals and one can write and figure out most Kanji right off the bat once getting some experience. (On a side note knowing this also helps reading Mandarin, Cantonese etc. too.)

Some form of easy browsable mode dedicated to radicals would be a great help for many - and especially for those learning Kanji as one no longer need to remember stroke-by-stroke-by-stroke, but simply 人(偏)土寸 to write 持 - which also helps with remembering meaning and (some) readings.

Another benefit of knowing the radicals and using them as base for learning Kanji is when on-site talking with the natives and asking like "Oh, that word - how do you write it?" and if there's an unknown Kanji involved it'll be easy for the native to explain how it's written. I cannot stress enough how useful this has been to me personally. Not sure if there's a datafile on this at the same source as the other dictionary files as I haven't taken a look at it yet, but - if not, I could provide a source which I got when I was there. It's currently in paper-format though, but I can digitalize it.

z1dev commented 6 years ago

The example sentences search means adding an entirely new feature too which would be lengthy to make. No data is generated for this currently. I won't say no for certain, but right now I have no plans to do this. If you want to contribute I can help to integrate it with zkanji.

The regexp search is possible to make reasonably fast, but only for kana readings and not in English at all. This is limited to the dictionary however.

Is the radical search already in the program not adequate for what you suggested in #6? It misses the meanings for each radicals, but that could be included as well. The third method includes the radical names and those can be used for searching. I plan a new type of kanji search, that would allow people to select the components and even their placement in the kanji. The stroke order data is made up of parts already, which you can check in the currently crashing kanji information window. (You can see this in the old version.) I don't know if this would be adequate but I might have misunderstood the feature request.

am2del commented 6 years ago

Indeed, examples and radicals would both require effort and therefore be left for later. Regex for kanji, English and others I can take a look at how to implement. I may make an outline which you can optimize and implement for these after I've had the time to go through the complete source, but for now I will focus on helping with debugging and solve install and the auto-update. The upcomming two three weeks I may have very limited time to spare though... Don't expect no miracles in a flash.

Simple regex for kana readings would be plenty helpful, and a good start. I think regex should have an ON/OFF feature later on (like the "?+"/"+?"), as to ensure the use never interferes with special characters that may be present somewhere. Also, reduce load when not needed on slow/old machines. I will look into this over time, first things first.

As for #6, how much effort would it be to just add a pop-up description at "Parts" & "Radicals" as well containing reading and meaning for now? And, if there's data, also include first occurance for JLPT and 常用 respectively? If there's no data available don't bother for now, I'll look at it when I take a look at examples, radicals and general regex if so. Later on I'll see if I can make an outline for full feature to implement.

As for the radical search mode, there seem to be some irregularities for me at "Parts" take a look at the screenshot: screenshot_2017-12-30_19-43-05_radicalsearch-parts01

...and you should notice there are more strokes than there should be here and there - 10, 15 & 22 among others . And the "Radicals as parts" mode some are drawn as their full-scale version... In UTF-8 there should be unique characters for each of them (don't got the positions in me head, but I know I got a note on it somewhere...). Would it be reasonable to use those as primary with an if-clause checking if corresponding glyph exist in chosen font, else use fallback to the full-scale one?

-- Addition --

I know this is being a bit picky with details, but one small detail caught my attention: --> Windows-users take for granted buttons for Minimize/Maximize/Exit are on the right hand side, while many *NIX-users (BSD/Linux/Mac) take for granted these are on the left hand side. Would it be a lot of work for you to add a setting in interface to swap side of the Lock/Close buttons for kanji detail dialog?

z1dev commented 6 years ago

I have added a Very large dictionary font option. This might still not be enough for big monitors, but it already looks huge compared to the rest of the interface.

6 There's a "names" toggle-button for writing them under the "Radicals and parts" tab, but tooltips were planned too, I just didn't do it yet. Adding meanings apart from the names requires additional data I don't have. If you can provide data, it could be displayed in tooltips or even used for searching between the radicals.

The irregularities you mention in the radicals window is due to default fonts not having the required glyphs on most systems. Even if there exists a UTF-8 code point for them doesn't mean the font can display them. The characters displayed are the closest match that likely to appear in fonts. Instead of figuring out which fonts to use here, it would be best to either use the stroke order diagrams I already have to draw these radicals, or make SVG pictures for each of them. My vote goes for the SVG but it might take a little work to do it.

I will answer the other issues and solutions you brought up too maybe tomorrow or after that.

am2del commented 6 years ago

Answer things as you got the time to.

Got some questions/requests/issues from my acquaintances I'll simply be forwarding, only forwarding those which I've confirmed for now - haven't done debugging yet. Questions came after the "zkanji.zks" and "similar.txt" fix for kanji-dialog had been applied. --> Clickable buttons/icons does not have popup descriptions, could this be implemented? --> At the "Kanji"-browser, when docking at top horizontally located: ----> 1) Making it as slim as possible, then closing zKanji and starting back up - it doesn't keep it's shape. Is this a design-thing or can it be arranged so it keeps its size? ----> 2) The "From"-DropDown at bottom, would it be reasonable to move it up above the kanjis and place it next to "Order"-DropDown to save space? Both above or below doesn't matter, just being on same row. ----> 3) Is it reasonable to add an "Enable/Disable All Filters"-button? --> At the "Japanese to English"-browser, when opening the "draw"-box zKanji simply crash. After re-starting zKanji after that initial crash zKanji will hang whenever, at the "Kanji"-browser, trying to open the Radical-search window. Possibly infinite loop as it overloads a CPU-core. Must send a "kill 9" to force exit after this - standard signal "kill 15" is used on regular close and won't work here - and whenever trying to use the Radical-search after this it will hang the same way until re-installed completely. See NOTE below.

NOTE: The "kanjipen.svg" is icon for the button at the "Japanese-to-English" which cause the initial crash, and will cause crash every time it's clicked. I was able to reproduce this with the GIT-build from 2017-12-28, however when trying a fresh GIT install a few hours after your response on the 2017-12-31 the window will simply crash on click - but the Radical-search will be unaffected afterwards.

Got some more feedback from them, will post here when I checked things out. Ain't got the time now.

am2del commented 6 years ago

I'll move the crash/hang issue to the bug-thread I made earlier.

am2del commented 6 years ago

Regarding data for radicals, I will double-check the data when digitilizing before providing as it's a bit old and contains referenses to 常用 and JLPT levels. Also, I think I got the radicals as either SVG or PNG somewhere... not sure which format, but I'll check on it.

z1dev commented 6 years ago
am2del commented 6 years ago

For #3, The addition of "Very Large" as font option is great. Looks good on screens ranging from 15" laptop to 27" stationary at 1920x1080. Small 4k screens got interface-scaling option so should be fine. I think this can be crossed of the list now.

Regarding retaining saved size, you probably tried but just in case you haven't: Did you try applying a work-around like adding an EXIT-procedure where size/position of each is written to a placeholder-file then on INIT read and restore? The size-restoration part may have to take place as the very last thing during INIT if really is a QT-issue.

(Wishing I've had the time to go through all the code... but probably gonna be a few more weeks before I've done that.)

-- Future feature requst #7 --

I made a mod-proposal yesterday in accordance to the requests I forwarded to you. Saw now you had tried, I'll see what I can do to try solve this layout thing. A "Show/Hide all filters" is available by menu-click which is nice, though - a button in addition would be great. Anyway I also added a "Toggle All"-button (disabled) just to get a feel for what it could look like. An option could be to turn the "Filters:"-label itself into a button if to achieve a "slim" feeling. That is, if this feature is to be available. Had my acquaintances test it out and they all said they like this better, but - one of them said it's nice for horizontal use and asked if it is possible to slim it down for vertical use as well as it's rather wide. Testing it out now in vertical mode, I can see it would be nice if things in this widget could automaticly break rows like the filters does, making the widest component the "From: " the limiter. Optionally even enable the labels "Order:" and "From:" to automaticly end up above their respective making the of "All:" the limiter... (Just speculating.)

As I'm unfamiliar with GUI-programming in general, is it plausible to fully adjust layout using IF-clauses in QT or just to a limited extent? I'll attach it as it is, please take a look and let me know what you think when you got the time to. kanjisearchwidget.ui.txt (Had to add ".txt" to attach.)

-- Future feature request #8 --

"Filter"-buttons @ the Kanji Widget, as well as "S.O.D", "Words", "Stroke shadow" etc buttons @ Kanji Information Dialog (floating) - would it be reasonable to add an "active" frame or "hover" effect on these when they are active? To contrast them from inactive. Nice touch that when "Words" is active, some buttons goes inactive. Or no-frame when inactive and frame when active? (Like the "Play" "Pause" etc in Kanji Information Dialog)

z1dev commented 6 years ago

As I'm unfamiliar with GUI-programming in general, is it plausible to fully adjust layout using IF-clauses in QT or just to a limited extent?

Everything can be changed programmatically. Some changes are simple, others are not. In Qt we have "layouts" that try to automatically place things depending on their default size. It sometimes wants to be too smart for its own good, or the default size is wrong, this is causing trouble restoring sizes at start-up too. There is no way around this layout system, that's why it can be so difficult sometimes to force on it what we want.

What makes things more difficult is that Qt calculates positions and sizes lazily, only making final calculations when a window is shown. There are workarounds for this but again it's not always easy. What you see in the designer is often not how the final result will look like, but at least it helps us to know what result we want to achieve.

Can you please show what you mean by horizontal and vertical use of the kanji search?

For #8, buttons do have a different appearance depending on whether they are active/pressed or not. I wonder if there's an issue with the theme? Can you experiment in the designer what makes buttons look different? Mainly by changing the "auto-raise" and "checked" properties.

am2del commented 6 years ago

Lets just ignore the docking retain-size thing then, seems too much effort for small detail. Maybe one day as "finishing touch", if at all.

I will take a look at #8, using QT Designer or QT Creator later, gonna fully digitalize the radical-info and see if I can find those radical images. I probably got the time for the radicals during the weekend, as I do it I'll make them as easily parsable tables in text files with the new radical index as unique key (252-system), the old radical index (214-system) will be present as well ofcource. Readings separated into preferred and optional. First line will be a header defining columns. Btw, do you prefer TAB, COLON or SEMICOLON as separator?

-- Kanji Widget: Horizontal-/Vertical-mode --

People have different preferances for layout, and some uses vertical screens (i.e. 1200x1920) while others horizontal (i.e. 1920x1200), or even flipable - and some "split" a single screen into several pieces keeping multiple applications visible at once - leaving people who wants several features docked at once wanting. This is why the topic was brought to my attention in the first place. Thus, the Kanji Widget felt like it couldn't be made "slim" enough - useable, but not oversized that is. Had some pointing this out as a reason they felt they had to close it and re-open it by need instead of keeping it available full-time. The Kanji Widget aside, everything seems to scale nicely.

Proposal: --> Using (parhaps dual base?) layouts, triggered by if-clauses like: ----> width < height and width > height ----> currentWidth < value @ Both modes: --> Convert the label "Filters" to a button for toggle-filters (in menu called "Show/Hide all filters"). @ Horizontal-mode: (if width>height) --> Allow filters to fill out the first line, adding additional lines as required thus allowing minimum height of down to approx. ~200px with only a few filters visible. --> For concept proposal, see: 2018...-Horizontal...-Draft.png @ Vertical-mode: (if width<height) --> Allow "From:"-label to jump up above the when too slim for being on same row, thus allow for a minimum width of approx. ~275px. --> For concept proposal, see: 2018...-Vertical...-Draft.png

Attaching various screenshots, and drafts in an archive: Screenshots_2018-01-04.zip

As for Vertical-mode, if the "From: " is above or below the shown Kanji is no big deal according to the feedback, but parhaps logical to keep all filters near eachother, thus placing it above? I don't know.

z1dev commented 6 years ago

I made the kanji search widget slimmer, I hope this will be enough. I removed the "From" label completely, added a toggle button to show and hide all filters, and moved the "Reset" button next to the filter controls, to make more space at the top. The combo-box for the "From" field is now moved next to the "Order" box in case there's enough space. I don't want to go slimmer than this but I hope it'll work for everyone. Moving the filter controls next to the toggle buttons is a bit more work, so I'll pass with that for now.

am2del commented 6 years ago

Got some feedback on the "study"-mode... but will get back to that later as I haven't gotten a clear idea yet of what they actually ment, was a bit confusing... Once I've cleared things out and tested I'll let you know, though - prioritizing the radical data atm.

-- Update on #4: Radical dictionary/browser --

The data file is more time-consuming than I had thought, digitalized and verified roughly 50% (maybe a bit more, but just to be on the safe side...). Gonna add a documentation header for the file too before providing. It will include referenses to existing kanji/radical stoke diagrams, in order to minimize work on the SVG images and stroke orders later. Regarding SVG, I had them as PNG so will make SVG's for the missing ones once the file is completed, or at least a complete draft of the file is. The actual browser-feature and regex will come after that. (Though, after the radical data, prioritizing debug and feedback for existing features.)

-- Update on Kanji Widget --

I've checked out the new layout and it looks good. Gotten some feed back, and it's all possitive - better than before. If, on day, the filters can hop up next to the "toggle"-buttons it'd be perfect they say. Good job!

-- Pre-release request: #1 "Help"-flag --

Would you mind including it in this release? It's useful not only to see optional flags, but also for generating generic start options etc - i.e. info, dictionary re-build/update etc. I'm aware there aren't much options at the moment, but still... It's not present in GIT Build (UTC): 2018-01-07 @ 15:43

z1dev commented 6 years ago

I'll add the help flags, and make them shown for either of these flags: --help, -h, -?, /help, /h, /? Usage of / is more of a Windows (DOS?) thing, but it won't hurt if all these flags work.

z1dev commented 6 years ago

I have added #1 help flags. Please test it. It doesn't work on Windows because there it's more complicated to have a GUI application that prints to a console, but the average user there rather reads text files anyway.

am2del commented 6 years ago

Great, I will make sure to check it out during the weekend.

Rather than a Windows-thing, the dash is a legacy from DOS. Remember using that back in the 90's.

On a side note: Well, it's expected - Microsoft does all they can to restrict their users and mold them into the "Microsoft mold". NT just enchance the limitations compared to the old DOS days. While, on the other hand, Linux is freedom - to a point which easily overwhelms people who are used to restricted environments. Many Linux distribution do apply sort-of limitations too, primarily to not scare off people who aren't allready super users - making it very beginner-friendly.

am2del commented 6 years ago

Thank you for implementing the "help"-feature. Not to be a smart-arse, but - would you mind making the output more uniform please? (See NOTE(s)-section as to why.) If you don't feel like it, or don't have the time to, I may take a look at this after finishing the radical data and SVG's. CLI is my field afterall, even if C++ isn't. Would like an "okay, go for it"-sign inbefore I do it in that case - and if you want the work to be in a specific file or any other restrictions I'd like to be informed of such as well as to ensure the project structure is maintained.

Paste below into a text-editor supporting syntax highlightning, such as kate or gedit, to see the difference more clearly. (Set highlight to Script -> Bash, or SHell-script.)

Also, help section is per default limited to max width: 80 chars When exceeded, row-break is forced. (See line 24-25.)

# -- Current Output --

zkanji v0.0.3-alpha
USAGE: zkanji [option]

  --help, -h, -?, /help, /h, /?
                  show command line flags without starting the program.

  -i [path]       import JMdict dictionary data at startup from files at path.

  Files needed for import are: JMdict in UTF-8 encoding,
                               kanjidic in EUC-JP encoding,
                               radkfile in EUC-JP encoding,
                               kanjiorder.txt,
                               JLPTNData.txt,
                               radkelement.txt,
                               zradfile.txt

  -e [path]       import the Tanaka Corpus example sentences data at startup
                  from files at path. Only works if the dictionary data is
                  already generated.

  Files needed for import are: examples.utf the Tanaka Corpus in UTF-8
                               encoding.

  -ie [path]      can be used when the files are located at the same path.

# -- EXAMPLE:  Uniform structured output --

zkanji v0.0.3-alpha

USAGE:
    zkanji [OPTIONS...]

OPTIONS:
    --help, -h
                Show help dialog and exit.
    --import-dictionaries[=PATH], -i [PATH]
                Import dictionary data at startup from files at PATH.
                Current Working Directory is assumed if PATH is NOT specified.
                Can be combined with:  -e, -n
                Info:  http://ftp.monash.edu/pub/nihongo/
                Required files are:
                    Source:  http://ftp.monash.edu/pub/nihongo/
                        JMdict             (Encoding:  UTF-8)
                        kanjidic           (Encoding:  EUC-JP)
                        radkfile           (Encoding:  EUC-JP)
                    Source:  SOURCE
                        kanjiorder.txt     (Encoding:  X)
                        JLPTNData.txt      (Encoding:  X)
                        radkelement.txt    (Encoding:  X)
                        zradfile.txt       (Encoding:  X)
    --import-examples[=PATH], -e [PATH]
                Import example sentences data at startup from files at PATH.
                Current Working Directory is assumed if PATH is NOT specified.
                Can be combined with:  -i, -n
                Info:  http://www.edrdg.org/wiki/index.php/Tanaka_Corpus
                Required files are:
                    Source:  http://ftp.monash.edu/pub/nihongo/
                        examples.utf       (Encoding:  UTF-8)
    --import-names[=PATH], -n [PATH]
                Import name dictionary at startup from files at PATH.
                Current Working Directory is assumed if PATH is NOT specified.
                Can be combined with:  -e, -i
                Info:  ...
                Required files are:
                    Source:
                        ...                (Encoding:  X)
    --update, -u
                Initialize update handler script.
                NOTE:  *NIX-systems exclusive, requires BASH 4.0+.
    --version, -v
                Show version and exit.

NOTE(s): (Please excuse the somewhat "random" order.) --> "[...]" is standard-container for OPTIONAL PARAMETER, e.g. something which may be excluded. ----> No need to make PATH optional, is optional in EXAMPLE for illustrative purpose. HOWEVER, if made optional it'd be nice to have a "Browse"-popup for user to select the directory. --> Upper-case destinctions, e.g. "PATH" instead of "path" or "OPTIONS" instead of "option". --> As making this feature available for NT/Windows would take effort, disable "/"-FLAG(s) would be benefitial as many shells (such as standard BASH) MAY missintrepet the "/" as PATH. ----> Concider adaptation @ compile time for NT/Windows ("/"), others ("--" & "-")? ----> BASH is standard shell for majority of NIX platforms/distributions (BSD/Linux/Mac...). --> Shells (CLI) has in general built-in parsing capabilities (often referred to as "TAB-completion") to ease things up for users, as long as things are kept in uniform structures - which can be identified. However, some shells do NOT have these abilities activated per default - while others have really advanced/complex parsing/glob capabilities (such as ZSH). --> Generating starter/launcher options for user comfort, dynamic scripts, etc. also requires ease of parsing. --> Inclusion of LONG FLAG(s) isn't a neccity, included for illustrative purpose. --> Some additional FLAG(s) listed for illustrative purpose. --> FLAG-description is distinctly, and uniformly, separated into categories: Description, Combination, Reference, Requirement (sub: Source, TABLE) and NOTE. ----> The categories does NOT require to be named for parsing, just have uniform patterns. ----> "Combination" would not be recognized by convetional shell-parser, but would be usefull when writing install/update-feature for NIX-systems. ----> "Source" and "Info" for reference, and convinience. --> Above EXAMPLE uses INDENT=4. Double-INDENT separates FLAGS from INFO and single-INDENT elsewhere. COLUMN-size is LongestFileName+INDENT, optional LongestFileName+\t. However, use of TAB may look very different depending on settings, hence SPACE-indent is preferred. --> Combination SHORT FLAG(s) listed in same format as the FLAG-list. --> Dual spacing after COLON, inbefore reference, for easy parsing. --> File-names and their encodings in columns for easy parsing. --> Check if C++ allow to pass any "UNRECOGNIZED" OPTION to "help"-function. Save the below script as a separate file and run with a mix recognized & unrecognized FLAG(s) and PATH(s) to see a simplified example of the behaviour:

#!/bin/bash
for i in "$@" ; do          # Taking flags from the command line.
    case ${i,,} in          # Ignoring case.
        -i)  echo "I has been recognized." ; continue ;;
        -e)  echo "E has been recognized." ; continue ;;
        -ie|-ei)  echo "I and E has been recognized" ; continue ;;    # Allows free combination order.
        /*)  echo "PATH has been recognized." ; continue ;;
        # Above deals with KNOWN, below deals with UNKNOWN.
        *)  echo "Invalid option:  $i" ; echo "Help:  Only I or E are known flags." ; break ;;
    esac
done
exit $?
z1dev commented 6 years ago

I received your mail of the radical example and I have questions if you don't mind.

Which part of this data should be displayed in your opinion? Is it important what level of JLPT or Jouyou kanji they first show up in? I would like to display as little data of radicals in zkanji as possible that's still useful, since zkanji is just a study aid. If more details are necessary (I can imagine people in official Japanese education might be forced to use them,) it will only come at a much later version of zkanji.

Do you have reliable JLPT data using the new 5 ranked system? I had data for the system where there were only 4 ranks, since these were officially published, but I haven't heard of an updated one. The data in zkanji was compiled by me personally, from several textbooks made a year after they introduces the system. They just gave examples of what goes to the N3 rank, and topics around them. My list is as good a guess as any, but it's probably very different from your data.

Since there already is a list of radicals that allow filtering kanji to be displayed by them, isn't it the logical step to use that to show which radical is used in which Jouyou level?

Last one for now! Is the listed names in the third method for radical search incorrect or missing any data? They didn't have translations for sure, which is a welcome addition though.

am2del commented 6 years ago

At the radical search, displayed info should be: --> READING_PRIMARY --> RADICAL_ORIENTATION <-- Common orientation, and NOT allways - is for GUIDANCE. Using a display format like: READING_PRIMARY (RADICAL_ORIENTATION) ...where RADICAL_ORIENTATION is the common orientation as KANJI, KANA or SVG. I can make those SVG's if so - simplified, please excuse the ASCII-art, something like this for 冠「かんむり、がんむり」、やね:

 _____
|XXXXX|
|     |
|_____|

Optionally include: --> READING_OPTIONAL Using a dual-line display like: READING_PRIMARY (RADICAL_ORIENTATION)\nREADING_OPTIONAL, ...

Also concider a: --> A "switch"-button to swap READINGfor MEANING. Reason: At the lookup, the pop-up information should be kept brief and in no way overwhelming.


As for the search-by-radicals, for simplicity, set your algorithm to ignore LIST-item beyond first. Else you'll end up with some 400 parts - where most are just a slimmer version - instead of the basic 214/252. Those additional ones and the other fields in the data are primarily ment for, what I will refer to as the "Radical Dictionary"-widget, addition which I mentioned earlier I'd like to make and include for those who wish to be able to browse and dig into radicals.

As my my file doesn't cover JLPT - I cannot guarrantie reliability of the N-standard data. For the time being, I line it up with your data - digging into this later as it's gonna take plenty time. Time which I don't have atm - and this data is sort-of irrelevant until the "Radical Dictionary"-widget is comming around. Mentioned this earlier among requests. The DICT_REF and REF_-fields are also for that "Radical Dictionary"-widget, to help bringing up stroke order diagrams etc, avoiding some extra work later on. May, only may, look into the possibility of integrating a study-option for radicals eventually. Knowing the radicals and their stroke orders, means one will allways write correctly - it's why I'd like to make it for people, even if it'd be a feature which isn't used a lot. Either way, as zKanji works as a dictionary - it'd be a good thing if it contains info about radicals as these are important, especially when using non-digital lookup methods in real life. If you're in Japan, pay the local library a visit and take a look at the dictionaries.

In regards to the file which contains the list of radicals for filtering, it's what I'd like your point of view on if to use generic data - as mentioned in the mail. Also, slightly off-topic, the "zdict.zks"-file - what's the licence on this one? To what extent is it open to take a look at the source, potentially modify or extend?

The problem with N-standard is that it's subject for changes on a year-to-year basis if I'm not misstaken. There are loads of smartphone apps which has data on this, and I speculate they got the same issue - although not confirmed. The WWWJDIC-app is likely to use the same data as the site, and it has data on the five N-level system. Also, if I recall right, there's a collection of test data from previous JLPT tests which could make a great cross referense for this categorization. However, if we can use these depends on licencing etc. I did take a look at many smartphone apps and noted they seem to show the same kanji at respective N-level, at least up to N2 on my acquentices behalf some months ago when they started studying Japanese. (Disclaimer: NO! I didn't check every single kanji, just browsing through for swift compares - while at the same time checking the apps for false information and errors... which MANY seems to contain allready at basic levels.)

However, the data on N-levels is for GUIDENCE in order to give the users a so-called priority-order of sort for which radicals are important to focus on - and roughly when. It's kind of pointless learning the flute-radical, 214/252, allready when being introduced to elementary 1 or N5 - it's this sort things the data on J- & N-appearance is to be helpful with. The three high-importance categories (2,J,N), where J & N are for helping people studying to filter their priority order. I do concider whether or not to extend the grading to 0~5 + J & N, where digits may be combined with J or N when a kanji only fit into one of these categories.

I will check on the names in the 3rd method, but you may have to be patient until next weekend. Gonna have a tough week, so I may not have time to answer or help out the upcomming few days.

I noted scaling on the radicals dissapeared when that bug was fixed... More difficult to see them now, suppose this played a role in resolving the bug. Would it be possible re-implemented if there's a full set of SVG's for radicals later?

Depending on licence terms, there's a project which could be helpful for radical images & strokes - the Kanji-Alive-project has made a dedicated radical font (Apache v2 licence), not sure about the terms but worth taking a look at parhaps? They also got 247 radical SVG's and radical animations (730 files) which I think is under the Creative Commons v4 licence. Take a look and see if conditions allows for a time-saver? Took a look at their project for readings and meanings, and theirs is not as complete as my file in this aspect - and there are some minor differences, as to which other sources I've taken a look at also has dissagreements on. I guess the differences depends on data origin.

Regarding the RadicalData-file structure, licensing etc. Anything you'd like to change, add or remove? Any data-field missing? Also, may I request your view on the LOCALE-thing?

Random side note: I miss the counters at "Kanji Search"-widget which were present in the old v0.731.

z1dev commented 6 years ago

Showing both the translations and names for radicals at the same time doesn't seem too bad to me. It could be similar to the kanji popups, just with less data. I agree that this should be a setting though, but I'd rather put that in the settings window instead of a button on the radical selector for consistency with common interface designs.

Either way, as zKanji works as a dictionary - it'd be a good thing if it contains info about radicals as these are important, especially when using non-digital lookup methods in real life. If you're in Japan, pay the local library a visit and take a look at the dictionaries.

I don't want to limit what zkanji can do, I just want to bring up my point of view in the matter. To me the program is mainly a study aid for those not in formal education of Japanese, with access to digital resources, who "just want to learn the language." I never had any use for radicals, which doesn't mean others won't find this data useful.

In regards to the file which contains the list of radicals for filtering, it's what I'd like your point of view on if to use generic data - as mentioned in the mail.

The USAGE_IMPORTANCE data can be generated from the dictionary, since I have data for what radical appears in which kanji. Please consider that this data only refers to the 6000+ kanji found in KANJIDIC. I think it's enough, but my take on what the program is for is what I wrote above.

Also, slightly off-topic, the "zdict.zks"-file - what's the licence on this one? To what extent is it open to take a look at the source, potentially modify or extend?

The stroke order data is completely my work, though the classification of elements (something only visible in the stroke order editor in previous zkanji versions) was taken from another project at the early days. I would like the data to stay non-commercial, but that's all my restriction on it. I didn't really publish a license for it so legally it's still private I think, but I guess this doesn't matter regarding zkanji.

The problem with N-standard is that it's subject for changes on a year-to-year basis if I'm not misstaken. [...]

I just done a fast search again on this matter. There is no official N list for ANY of the kanji. They said at the 2010 change of the system that the old JLPT1-4 match N1, N2, N4 and N5. Everyone uses these lists, and the N3 is an estimate. If they use the same data that only means they all took the same list compiled by a single site.

When I passed JLPT2 in 2009 (still the old system, but with official lists) they said at the time that on any level, there are 20% kanji/vocabulary they just randomly take from the more difficult levels. I would guess they still do that, so the changing requirements might be nothing more but the same thing.

I noted scaling on the radicals dissapeared when that bug was fixed... More difficult to see them now, suppose this played a role in resolving the bug. Would it be possible re-implemented if there's a full set of SVG's for radicals later?

I use 0.75*[square height] point size for radical fonts, scaling them down until they fit. My "fix" just lowers the point size until the fonts fit. If they don't fit ever, the last known valid size (or the initial size) is used. I still don't know what caused the error, apart from that the measurement probably gave incorrect results. This is in Qt so I can't do anything with that. If we switch to a different font that's proven to work or to SVG drawings, this problem will be fixed.

Depending on licence terms, there's a project which could be helpful for radical images & strokes - the Kanji-Alive-project has made a dedicated radical font (Apache v2 licence), not sure about the terms but worth taking a look at parhaps?

I would like to use as few outsider resources as possible. The Apache v2 licence (according to their official website) is in theory compatible with GPLv3, but there are restrictions and the wording doesn't make it clear whether this allows or doesn't allow in my specific case the use of their data, and I'm not a lawyer.

Regarding the RadicalData-file structure, licensing etc. Anything you'd like to change, add or remove? Any data-field missing?

The only data needed to be able to show radical names and translations is radical number and the text data for them. The first line which lists "ID_UNIQUE;ID_NEW..." etc. is unnecessary, unless you plan to reorder the data or insert new fields in the middle. But in that case you might want to fix the missing semicolon between "RADICAL_ORIENTATION REF_KANA" for example. The rest depends on what features you need for the radical dictionary.

This line contains a character that I don't have the font to display: 004;4;4;4k;1;ノ;;;;;;bend;bending stroke;の;;;の|のんむり;1;5;2

Also, may I request your view on the LOCALE-thing?

With the current data format, as you write in the description, if they provide a separate file, it can be used later fine. Although it might be worth changing the format to be extendable in case we want to distribute a single file with all translations. For example extra translations could come after the last standard data with the format [LANGUAGE CODE]:translation (i.e. "EN:straight line") There should be a restriction on the translations, as they can't contain some characters, like the semicolon.

Random side note: I miss the counters at "Kanji Search"-widget which were present in the old v0.731.

This is planned, but wasn't priority with all the bugs and changes. Adding a "status bar" with Qt is a bit complicated, which made me postpone this feature.

am2del commented 6 years ago

@ Radical-popups: I agree with putting it in settings, it's not like users will need to change this behaviour on-the-fly especially often.

@ Limitations: I think I'm starting to get a good idea of the outline for the project, however - I will bring up any ideas/proposals for discussion inbefore acting on my own accord.

@ Generic data for radicals: I believe all the 3000-ish most frequent kanji are more or less completely covered in the data, and should be enough. Performance-wise I think it'd be best to pre-generate data on 常用 and JLPT 4- & 5-level systems into static parameters at dictionary import using a cross-reference algorythm picking first appearance of each radical in each system and counters for first appearance level IN_JYOUYOU, IN_JLPT4 & IN_JLPT5 and a TOTAL_ for each of the three plus an overall total. All linked to a ID_UNIQUE factor. The final one is grand total apperances in the available data all-togather. This could potentially allow for a user-adjusted importance grading in settings. (On dictionary update, ensure this data is re-generated and as accurate as source allows for.) "Why include the 4-level system?", as the 5-level one isn't set in stone with official data it may be useful to the user for complementary guidance at blurry levels like the new N3.

@ Fields in radical data: The ID_UNIQUE is required as the traditional system (214) collides with the new system (252) where some radicals which are separate in the new are fused in the traditional and the other way around which means there are duplicate numbers in these fields. Also, there are reading/meaning differences at these particular ones - otherwise the ID_UNIQUE had never been added. Thanks for pointing out the missing SEMICOLON, been changing the structure a couple of times manually through the process. The data in the sample is pasted from a different template than the header and therefore this occoured. Gonna run the final version through my somewhat-intelligent parse-checker script inbefore submitting to identify any potential issues and get a visual structure verification. I suspect this file will be ready on Saturday, or maybe final check will be on Sunday.

@ Locale: Good point. Could make a dual-structure too, like - a small file for single locale with locale-extention for space-concervation (for those who doesn't want/need lots of lingual options) and a large file containing all locale(s), which...: --> ...has a LOCALE-column (making the file more human-friendly), or --> ...every translation initialized by LOCALE as you suggested allowing "infinite" expansion. The latter probably best to use a generation-script from single-locale files? The choosen separators are extrodinary rare as part of translations and would allow to skip quotations etc. We could create an ESCAPE-char if neccessary. The LIST_ALT, however, is choosen for not needing to be converted to something else upon presentation.

@ zkanji.zks & radical-SVG's and font: Is it reasonable to add complementary stroke-order data for radicals (and potentially kana as well) to existing data, keeping it all in one place and same format? As per your wish to not involve other projects/licences, I'll be making a full set of radical SVG's myself for this project - stroke-by-stroke in separate layers and named/enumerated by a dual digit pattern. Starting with this once the data-file is done, so - probably starting Saturday/Sunday. Will see if I can get the half-broken touchscreen (with pen-support) I got lying around to work enough to draw by hand, else I'll use mouse. To align things with the project and keep things uniform, what default width & height shall I use? Also, for the visial enumeration at the "Kanji Information Dialog" - is this generated? Pre-set SVG? Raw position data? Or what method/format should be used to provide users this for the radicals as well? Anything I need to keep in mind for ease of implementing later into same, or similar, dialog as the current "Kanji Information Dialog"?


Side notes @ Use of radicals: Basicly, think of each radical as a latin letter - but each got their own meaning - and just like we combine latin letters to make words, they combine radicals to "draw" simplified pictures which got distinct meanings. When studying alone at home in a foreign country, the meanings/readings of radicals may not be very useful - however, when on site interacting with the locals, it's a different story. Just like "bad" combinations of latin letters form gibberish, so does radicals. Hence, knowing them can be plenty helpful at times - also reduce things to memorize and ensure to know how to draw new kanji at first glance. The one major difference between radicals and latin letters is that a combination of radicals can be figured out without the need of a dictionary in most cases - whereas latin letters is complete gibberish until explained and combination memorized; just a pronounciation without meaning.

Off-topic: As I never did no official test - when you did the JLPT N2, did it involve actually writing kanji and writing full sets of readings? Or did they have their usual select/check-options for testing these things? May be good to know as I may need to be taking a N2 or N1 for proof-of-knowlege-level.

am2del commented 6 years ago

The radical data is ready for final check tomorrow. Would you mind adding permissions for me to upload and request pull? Preferably to separate branch. Otherwise I'll take some a long way around, forking and pulling - optionally mailing. I suggest placing it in the dataimports-directory, does it suit you?

z1dev commented 6 years ago

Please excuse me for not answering sooner but I couldn't find the time. I'll check the radicals you sent me.

Performance-wise I think it'd be best to pre-generate data on 常用 and JLPT 4- & 5-level systems into static parameters [...] "Why include the 4-level system?"

When they introduced the 5 level system, they said the new middle level will just be stuff from the previous JLPT2. To translate this to the levels: JLPT1 = N1, JLPT2 = N2 + N3, JLPT3 = N4, JLPT4 = N5. There really isn't a need to generate a value for both. Users just need to be told that when they say N2 or N3, those were both JLPT2 in old times.

The ID_UNIQUE is required as the traditional system (214) collides with the new system (252) where some radicals which are separate in the new are fused in the traditional and the other way around which means there are duplicate numbers in these fields.

I see that you use the ID_UNIQUE in RADICAL_REFERENCE, which answers my previous doubts. It doesn't need to be used as the SVG file names. Please see my idea regarding this below.

A question about duplicates, for example:

026;26;22;22;2;匚;匸;;構;;;;かくしがまえ;;box;enclosure|side-ways box|box-on-side
027;26;23;23;2;匸;匚;081;構;;;;はこがまえ;;conceal;dead|box|enclosure

or

057;56;47;47K;3;川;;058;旁;;;川巛;さんぼんがわ;かわ;river;
058;57;47;47K;3;巛;;057;冠旁;;;巛川;まがりがわ;;curved river;river

What should the program display as radical name and meaning in such cases? Also should we display the optional meaning in the tooltips, or you only mean it to be used in the radical dictionary?

The RADICAL_OPTIONAL almost always contains a character that my system cannot display, just the placeholder box character is shown. As you mention in the TO-DO, if we use SVG images to display them, will there be a separate file that tells us which unicode code-point is which SVG? Maybe after the SVG files are done, we could just name each SVG file after a unicode character as hexadecimal, like 0x4F31.svg. I don't think it's necessary to use the ID_UNIQUE for the file name if we do this.

Is it reasonable to add complementary stroke-order data for radicals (and potentially kana as well) to existing data, keeping it all in one place and same format?

For each stroke order in the data, it's possible to specify a unicode character, and a name. The data potentially already contains all the radical stroke orders, and I have stroke order data there for the kana, You can see that by checking the kana tests in the study menu. The stroke order diagrams are made up of parts, which are other stroke orders, all that's left to do is to specify which unicode character they are, and then it'll be possible to show an animation for them. What is more, my original idea was to replace the current display of radicals with the S.O.D. but it doesn't look nearly as good as a font or SVG. Or bluntly put, it can be really ugly.

Also, for the visial enumeration at the "Kanji Information Dialog" - is this generated? Pre-set SVG? Raw position data?

If you mean the stroke number displayed on the animation, this data is generated on the fly when displaying the stroke order diagrams. It tries to pick a fixed position near the starting point, and if all pre-defined positions are already taken, the number might be moved by a small distance. If this is not possible, that number won't be displayed.

z1dev commented 6 years ago

When studying alone at home in a foreign country, the meanings/readings of radicals may not be very useful - however, when on site interacting with the locals, it's a different story.

Do you mean you need to study readings and meanings of radicals to be able to talk with Japanese people about your language study? Or maybe you're in formal education where it's one of the requirements?

When I took the JLPT it was, and to my knowledge still is only a multi-option test. Fill in the circle and you're done. Wherever you take it, the test paper is sent to Japan to a central location, and the answers are checked automatically. They don't do it by hand.

am2del commented 6 years ago

It's not mandatory for what I know, but as previously mentioned - when talking with locals and unsure how to spell something, it's really helps if one know the radicals as they can explain in a flash in an a way which is easy to remember - no need for pen & paper, digital devices etc. See "For example..." below.

As I believe I mentioned earlier, it also helps figuring out potential meanings/readings of kanji one has yet to learn. Radicals also reduce the amount of things one need to remember when studying, and one will also know the fail-safe way to allways draw correctly - even if it's the very first time encountering the specific kanji in question.

For example, most kanji consist of 2-5 radicals - but often 7-20 strokes. so learning a high-radical low-stroke like "前" (which is early stage kanji) if going per radicals (253) it's: --> "丷" (weed), --> "一" (one), --> "月" (meat, optional appearance of 肉 which also drawn ⺼ - referred to as にくづき), --> "刂" (blade) We get something like "weed first then meat cut". Which helps narrowing down and remember the usage (meanings), although in this particular case - we get no readings from the radicals. "Weed" can also be drawn as a three-stroke combination of "weed" & "one", where "一" crosses "丷", and then exist in the 214-system as alternate to "屮". (Glyph for this alternate currently is missing in my data file as I haven't found it in the UTF-8 table) Other advanced kanji like "鴫" - which is outside of both 常用 and JLPT - consisting of "田" (field) and "鳥" (bird), we can figure out "it's something birds do over/at a field" - like hunting worms or flies? Which leads to the meaning "snipe". That it's pronounced "しぎ" cannot be derrived from radicals. However, we reduced 16 strokes to just 2 radicals and got an idea of the meaning too - what a bargin - and making it easier to remember in the long run too. However, kanji with many readings usually have a couple or more pronounciations which are derrived from - or closely related to - the radical readings.

It's very likely this kind of logic which, supposably, created the characters back in the old China. Either way, when reading it's much more vital to understand the meaning - than to know its pronounciation. Looking at our latin letters, it's gibberish - combinations of pronounciations without meaning, until you're told what it means or look it up in a dictionary. No way to figure it out per basic logic, and yet - many westerners call kanji/hanji "stupid"... It's just simplified pictures.


Off-topic: Thank you, it's good to know as reference they use those select-style cheets. (It's allways subject for changes though.) Don't know if I need to take a certificate yet, nor if I must show an N1 in that case - or if sufficient with an N2. Grammar is not my speciality, so it'd be my only real concern if they can't pick on my handwriting. Not sure how many of the N1 kanji I know atm as I learnt the 常用 way and not all of the 8~10's, but... if it comes to it, I'll test myself in advance.

am2del commented 6 years ago

Thank you for further clarification on the JLPT. Lets drop the 4-level system info and base things on your data for the 5-level system in combination with 常用 - and with a notice for the users regarding N3 & N2.

@ Radical Data First, I feel the need to correct myself as I've been stating 252 when it's actually 253 radicals in the NEW system. In regard to the "duplicates" you mention, they are in fact NOT duplicates if you take a closer look. I do agree it's easy to miss this fact, but to make things clear:

ID_UNIQUE   ID_NEW  ID_OLD  RADICAL_REFERENCE   READING_PRIMARY     MEANING
   026        26      22                        かくしがまえ          Box, Enclosure, Side-ways box, Box-on-side
   027        26      23          081           はこがまえ           Conceal, Dead, Box, Enclosure

They look near identical at first glance if using small font, however they are drawn differently and have different meanings. 026 is drawn either left-to-right then top-to-bottom-to-right "一" "L", or right-to-left-to-bottom then left-to-right "「" "_". BUT when drawing extra attention to the top-left-corner, which must be a corner or have a slight horizontal gap. If there's a vertical gap or the vertical line "|" appears to be "hanging down" from the top line - it'll be intrepeted by the reader as 027. 027 is a simplified version of the 3-stroke radical 081 "亡", and - when drawing it - it's important the bottom "L" appears to be "hanging down". Also note the difference in MEANING(s), the first two at 027. Further, in the traditional system with 214 radicals, they have different ID's - while, in the new system with 253 radicals, they were fused as "alternate apparances" due to the similarities.

It's similar with "川" and "巛", they have different usages - I got comments on this in my data which I decided to not put in the file for the time being as that data refers to general guidance and train of thought as to when which is used. I will include this later in a separate file as it may be useful for those who wish to study, where I use the ID_UNIQUE as KEY for linking. Another reason to not include this in the same file is that some have extensive info while others got none at all as it's not needed. Here, note the ID_NEW. While they are the same in the traditional system (ID_OLD) - in the new one (ID_NEW) they are treated as separate due to their usage.

ID_UNIQUE   ID_NEW  ID_OLD
   057        56      47
   058        57      47

RADICAL_OPTIONAL field got the correct glypf in 99~100% of the cases, and majority of fonts does NOT contain these even though they are in the UTF-8 definition. Most of them are squeezed versions of the primary appearance, some are completely different - I will make SVG's for all. There's also the annoying 038 which doesn't exist in the UTF-8 definition (as far as I know, please correct me if I'm misstaken), it's the "ノ一" in "毎". I got little or no data on this particular one, refer to NOTE @ RADICAL_DATA-section in file.

@ SVG & Stroke order Using unicode point may prove troublesome as some (like UNIQUE 038) seems to NOT EXIST in that format - as far as I know, all data uses a placeholder for that one... but as that placeholder is a kanji in itself, I would prefer to not link 038 to a placeholder's point. Otherwise this had been a good idea. As long as we use a parser which support idenfification of COLUMN-BY-HEADER - or a separate file of pairs - we can add a pointer-column in the data for irregulars later and use the unicode-pointer for irregulars such as previously mentioned 038. (I do not wish to hard-code references unless strictly neccessary.) From what I've seen if the S.O.D., I agree with you on this. It would be preferred to extend if possible - that is, if needed. Great thing stroke numbers are generated, much less work in this aspect. As I make the SVG's, I will put strokes on separate layers - and although I enumerate layers - the first stroke is the buttom layer and the last stroke the top layer. Also, for concistancy, I'll use the Incscape icon-template @ 128 as base for each. Haven't checked if I can use that touchscreen yet... a few things came up. Either way, as I recall the screens right-hand side had no sensitivity at all (refused to respond to touch) while the left-hand side had full depth sensitivity (64-levels?). Depending on where the breakpoint between left-and-right is, it may or may not, be useful. I will try get started with this before the weekend, but no promises.

PS: If you're busy then you are, no worries. EDIT: Just forgot to format text for easier reading.

z1dev commented 6 years ago

Thank you for your explanation regarding radicals, though I'm aware of it. My personal experience with learning kanji without consciously studying radicals, is that it's enough to know what radicals generally are, and to recognize which parts might be radicals. I can guess the reading of many unknown kanji, because the brain is built to recognize patterns, and radicals are exactly that. If a student is told that radicals can have the property of holding the reading to a kanji, they can figure out the rest when they expand their vocabulary.

The same is true for stroke order. There are some special cases when the order is not easy to find, like the well known 左 and 右, because they evolved from different sources, but in general stroke order is the easiest part of kanji learning. It is true though, that during my years of reading/writing Japanese, many radicals with their meanings and stroke order stuck in my head.

I myself enjoy studying kanji, and reading is fun too, so I'm always disappointed when an author uses hiragana too much, but an alphabet has its own advantages. It might just be random symbols stuck together without meaning, but if you learn a language with an alphabet from native speakers via listening, you don't have to learn to specifically read that language to be able to read. (Apart from strange languages like English, because the spelling doesn't make any sense...) In many languages (which are not English,) letters can be read without knowledge of the word, to find out the pronunciation.

When it comes to Japanese and kanji, it is undeniably daunting for a beginner. It's only the first step to get over the thought that it's "impossible" (many people I knew who were learning Japanese felt that way.) It takes some time to read about the writing system, and to understand what that information actually means. I heard conflicting opinions from people who have on-site experience in Japan. Someone told me that even the Japanese he met have no idea how to write, many can only write a few hundred kanji at best and read maybe a thousand (those must be the uneducated people I guess.) It is true though, that even intelligent and educated people need to look up how to write an obscure kanji sometimes, like the word 薔薇.

I'm not sure our essays are great to write in "GIT issues." Is there a forum feature here somewhere?

In regard to the "duplicates" you mention, they are in fact NOT duplicates if you take a closer look. I do agree it's easy to miss this fact, but to make things clear:

By duplicate I meant either the "old radical number" is the same in two separate lines or the "new radical number" is. When I hover on an "old radical" in the radicals window, what should be displayed? My guess is "both," but I asked just to make sure.

RADICAL_OPTIONAL field got the correct glypf in 99~100% of the cases, and majority of fonts does NOT contain these even though they are in the UTF-8 definition.

I see. If you can make the SVG images, we won't need the current "wrong" characters, and only show the optional (or rather real?) ones. I think optional should be renamed as "real" or "correct" radical, and instead of including the character itself, there could be just a code-point in hexadecimal if available. If the editing is done by hand, I can fix it with a script. Can the description line CHAR: Alternate appearance(s) of the radical. be altered, to say it's the code-point for the actual radical, while the RADICAL_PRIMARY is just a replacement when fonts can't display the real one?

There's also the annoying 038 which doesn't exist in the UTF-8 definition (as far as I know, please correct me if I'm misstaken), it's the "ノ一" in "毎". I got little or no data on this particular one, refer to NOTE @ RADICAL_DATA-section in file.

It's very likely that this specific radical is not in the unicode standard, because the non-classical radicals used in zkanji are not an official radicals listing. They were created by someone for his Japanese text editor. The page for radkfile gives some details about this. It is useful because it is more detailed than the classical radicals, but it is most probably never used in Japanese-published dictionaries.

As I make the SVG's, I will put strokes on separate layers - and although I enumerate layers - the first stroke is the buttom layer and the last stroke the top layer.

The only support in Qt for SVG is displaying them by the standards. It doesn't even have common implemented features not in the strict standard. I don't know if it even supports layers, but hopefully it does. I will test this. I use Inkscape for the icons too, and none of the fancy effects show up in Qt, so I was forced to not use anything else but normal fill and outline, or the gradients (with transparency.) Outline or stroke width works at least. Keeping text in an SVG is in general a bad idea since the target OS might not have the font installed. They must be converted to path first. This is unrelated to radicals, but can be good to know for future reference.

I forgot to answer about the document size for the SVG. The size doesn't matter, but please make sure the radical touches or at least is close to the edges of the document, because if I specify which rectangle the SVG should be painted in, Qt will use this rectangle as the document edges. It also distorts the sides if they are not the same ratio in the editor as the destination rectangle.

am2del commented 6 years ago

It is true though, that during my years of reading/writing Japanese, many radicals with their meanings and stroke order stuck in my head.

With experience, developing this type of recognitive ability is inevidible I believe - and my guess is it's part of the reason learning radicals at early stage isn't mandatory for natives. Well, apart from enough to use dictionaries.

When it comes to Japanese and kanji, it is undeniably daunting for a beginner.

I can confirm this, beginner-level people (understanding less than 100 kanji) in my surroundings told me - after I informed them about radicals and how to use them per example - they saw the light at the end of the tunnel, no longer overwhelmed. And they asked why I didn't tell them this earlier. I told them it's best to know, or at least be able to recognize/read, some 50+ kanji before starting with radicals. Why? Learning the radicals can be boring, or feel unrewarding unless knowing some kanji in-before, thus - to see the use and when learning them - focus on the important ones for current lingual level. Which ones are important? Well, I'd prefer to add a user-defined parameter for this, hence - some data needs to be generated for commonness in the dictionary files used in zKanji.

In many languages (which are not English,) letters can be read without knowledge of the word, to find out the pronunciation.

It's the sad truth for most languages using latin letters. At least the whole Germanic family (Danish, Norse, Swedish etc), German-related ones (German, Dutch etc.), French- & Finnish-families and so on... included. Only plausible exception that comes to mind is the Spanish-family (Spanish, Pourtoguise etc), second only to the Chinese-family (Mandarin, Cantonese etc) in terms of practitioners.

I'm not sure our essays are great to write in "GIT issues." Is there a forum feature here somewhere?

Good point. If carrying on these things, lets move to mail conversation.

@ Radical display info For display, I think you only need to show READING(s) and MEANING(s) in pop-up, maybe include RADICAL_ORIENTATION (as image, not kanji) and possibly RADICAL_OPTIONAL. I do think it should have a setting (File -> Settings -> Interface -> "Radical pop-up" or "Kanji Search Widget" parhaps?) where user can select which things to show in the pop-up. Strokes should be shown the way they are categorized now, however - there are irregularities in order and stroke count! DO NOTE that some radicals stroke count do NOT match the stroke count of - what may appear as - identical kanji. Like うり (瓜), where the radical has one stroke LESS than the corrisponding kanji - it's a very common misstake when people list it as its ID is in the +1 stroke range. きば (牙) is another which also falls into this category of misstakes. These irregulars may have an optional notice in pop-up, like "Display warnings regarding irregularities" - or as a colour-shade?

I think optional should be renamed as "real" or "correct" radical, and instead of including the character itself, there could be just a code-point in hexadecimal if available.

I think you've missunderstood this one... There's no "correct" nor "real" appearance for the radicals. There is a dictionary-form of each radical, which will be the RADICAL_PRIMARY corrisponding SVG (see note below) - this form is the most common visual appearance, especially for lookup. However, radicals can be drawn differently for various reasons - such as depending on orientation and if they are an actual kanji etc. Concider like 王, when in 偏-form, the final stroke is tilting upwards towards the right corner (especially if drawn by hand) - such can be a reason to end up in RADICAL_OPTIONAL column. Sometimes differences are major, sometimes minor. But these are good to know - or at least good to be able to look up when unsure. As for the code-point, should it be the UTF-8 or Unicode point? (These are not the same thing I believe, though I'm lacking in experience for these things.)

NOTE: Exception for ID_UNIQUE 011, where optional is the dictionary-form as it is never drawn the way it's displayed in dictionaries. The dictionary-form of this particular one is to NOT confuse it with the optional appearance of ID_UNIQUE 013.

@ SVG <--> QT-relation Good to know those things, will keep it in mind. QUESTION: Does QT distort due to EMPTY AREA in SVG within the document frame/limits? (E.g. 128x128px frame/limit and only top-centre 96x48px has content, leaving the rest BLANK/EMPTY.)

The reason for the layers - and their naming - is for later reference as well as for anyone wishing to re-use the source, split for stroke order diagram which (maybe) is missing or for whomever who wish to verify something to have an additional pointer for strokes, although direction is not shown here. May possibly enable using parser to extract stroke-by-stroke data for creating stroke order diagram which fit same pattern as current data, e.g. extend to cover missing - unless licence prevents this. May end up with some ~400 SVG's in the end to cover alternative ones, so gonna keep them simple and not stylized.

z1dev commented 6 years ago

I will definitely add a setting for the popup contents.

I think you've missunderstood this one... There's no "correct" nor "real" appearance for the radicals.

There are multiple radicals in the radical search represented by the "wrong" kanji (or kanji containing more than just the radical.) My understanding was that the RADICAL_PRIMARY meant those when there was no closer match in fonts. In my understanding those are not correct, just placeholders. Although your list might hold closer representations.

As for the code-point, should it be the UTF-8 or Unicode point?

Unicode is a character set, while UTF-8 is an encoding of Unicode. This means that when there's a code-point, which is a number if we simplify things, it's the same number in any encoding of Unicode. Be it UTF-8, UTF-16 etc. If a document is labelled as simply "Unicode," it's an error. (Or comes from a tradition of the operating system, but actually means a specific encoding with a different name.) The encoding determines what the actual bytes in a document or in the memory are, that make up a code point. UTF-8 is perfect for our goals, as it is the most widely supported encoding.

QUESTION: Does QT distort due to EMPTY AREA in SVG within the document frame/limits? (E.g. 128x128px frame/limit and only top-centre 96x48px has content, leaving the rest BLANK/EMPTY.)

Qt only distorts the SVG if the document was a square for example, while the rectangle drawn into is not. In this case the image will be stretched. The empty areas are left alone, they will be empty in the painted image.

am2del commented 6 years ago

@ Radical Data & Unicode/Encoding Great explanation of the difference between points and encodings, thank you. I have found majority of the UNICODE-points, I will add a UNICODE_POINT-column to the data - starting each idendifier with 0x, or could be U+ if preferred. This column may contain a list using LIST_ALT where first item is RADICAL_PRIMARY, and the following corresponds to RADICAL_OPTIONAL. I still have no idea as to how to find the point for that annoying 038 though... Some other missing too, haven't put the data togather yet in this aspect. I will add the SVG-name for non-existant UNICODE-radicals - the format will start with ?x followed by ID_UNIQUE (3 digits) and position (1 digit), where 0 is RADICAL_PRIMARYand 1+ is corrisponding RADICAL_OPTIONAL. So for the annoying 038 where PRIMARY is non-existant: ?x0380 (Or wait, is Microsoft still being a btch having no support for "?" in file names and symlinks in their junk-systems? FATn/NTFS/exFAT? Real file systems like EXTn/BTRFS etc support this. If you don't know what symlinks are, on any NIX-system - refer to man ln and flag -s or the documentation for in-depth details. *NIX-systems are any BSD/Linux/Mac etc, in short: UNIX-related.) If you got any look-up trick up your sleve you're willing to share, I'd appriciate it. ID_UNIQUE will stay as identifier for a line, as the info forms unique groups - the data from a single line should not be split in any other way to stay logical.

That touchscreen proved too defect, so gonna use mouse to draw the SVG's. Had that screen for design once up on a time... but it had an accident and now its responsive area is a rectangle of 50% of the height and 67% of the width starting from top-left corner. It only responds to movement and doesn't recognize "click", "hold", "erase" etc. of any sort, not even using the stylus buttons. Opened it, but couldn't fix it - nor improve sadly enough. Not a single dead pixel though... so is still nice for colour-representation in-before print of digital art.

The RADICAL_PRIMARY is ment as the single most common way to draw it, or the DICTIONARY STANDARD APPEARANCE - with exception mentioned in previous post. The ones listed currently at RADICAL_PRIMARY are, most likely, as close as it can gets to this. Though a few are rarely supported by fonts, and 038 seems non-existant in the digital world - if there's any hope what-so-ever it's probably UTF-16 or UNICODE-point which is unrepresented in any standard encoding. The RADICAL_OPTIONAL is supposed to contain any optional way to draw it - and some of these seems non-existant too in the digital world too. An example of an optional way to draw something, which is currently not listed in the data as there's (for what I know) no glyph for it is a version of 羊垂 with an extra stroke, most will probably misstake it for a combination of 草冠+王頭+の偏, or 羊頭+の偏 - it's present in, among others, "着" - which is actually just 羊垂+目脚. Noted in the data we use, 羊 in these cases are only listed among "similar" and not among "parts", where it should be present too - check ID_UNIQUE 161 for the three apperances which do got glyphs. (Speculation of origin: sheep + eye = whool & seen, probably related to what cloths are made of plus that they are made/worn to please the eye - and as cotton looks like whool in raw form, probably no destinguishing was made in old times. The extra stroke is likely to emphase "king"/"ruler" as these materias were luxuary items.)

@ SVG & QT Realized my question was prone for missunderstandings as I read your answer, entierely my fault - but I think I got it anyways, let me just verify - is this true? Using a QT-rectangle of 130x130px and an SVG with document bounderies of 128x128pxwhere centre-top 96x48px has content and remaining 16x48px+16x48px at top-corners and bottom 80x128px is unused/EMPTY, there will be no deformed stretch. Correct?

*** ASCII-representation ***
128x128  -->  130x130     (In theory)
SVG      -->   IN QT
 _____         _____
| XXX |       | XXX |
|     |  -->  |     |  ==  No deform?
|_____|       |_____|

         OR?
 _____         _____
| XXX |       |XXXXX|
|     |  -->  |XXXXX|  ==  Deformed?
|_____|       |XXXXX|
am2del commented 6 years ago

I've started adding UNICODE_POINT, and it's gonna take me a couple of days. The new column may contain multiple pointers for each glyph, so there are two LIST delimeters ("|" ":") used in this column to align things and define in which order the glyphs for each appearance is preferred. Also done several other changes too, so - added a CHANGE_LOG and updated file-version as data in current file doesn't align with the version I mailed you. Managed to find some data on that illusive 038, but still no PRIMARY glyph - found an OPTIONAL appearance glyph though.

z1dev commented 6 years ago

I have found majority of the UNICODE-points, I will add a UNICODE_POINT-column to the data - starting each idendifier with 0x, or could be U+ if preferred.

It should be "Unicode code point" to be precise, but if you just call it UNICODE, it'll be clear. It doesn't matter how it's prefixed. 0x is a notation specific to C and derived languages for hexadecimal, but programmers generally understand it. You could also just replace any Japanese character with the Unicode code point. Providing both the code point separately and the characters in the file don't make sense, because just by reading the file (which is already in UTF-8) you get the code points anyway.

is Microsoft still being a b*tch having no support for "?" in file names and symlinks in their junk-systems?

There are many file name restrictions in Windows, and those were never clearly stated, so if you look for a list of forbidden characters you are out of luck. It truly is junk but I'm not a fan of Linux either, so I don't want to stand on either side. Let's leave this outside of GIT please. Some people who gets triggered by it might come to join in. Instead of ?x, you could use 0z or anything else.

If you got any look-up trick up your sleve you're willing to share, I'd appriciate it.

The input method editor for Japanese in Windows allows you to draw the characters like the recognition in zkanji. If there's some obscure kanji or radical present in Unicode, you can draw it too, but drawing 038 doesn't give anything matching. It just doesn't exist in that form.

There's no deformation of the SVG. It's fine if you place the radical at the position and size where it actually should be.

am2del commented 6 years ago

I renamed the column to UNICODE, originally set it to UNICODE_POINT to ease up destinguishing for the inexperienced - but there's a note in column description regarding this anyways.

0x is a notation specific to C and derived languages for hexadecimal, but programmers generally understand it.

I'll change the prefix to Vx, the "V" for "Vector-graphic" - using capital letter for the human eye - and the "x" as delimiter for consistancy in-before the ID-reference. Ease of organizing and destinguishing files without reading the data file, also easy for any parser to determine the difference.

Providing both the code point separately and the characters in the file don't make sense, because just by reading the file (which is already in UTF-8) you get the code points anyway.

I agree with this. However, I still decided add a column instead in order to keep the file "human-friendly". Added notation in the column description, and for a parser it's easy to simply skip a column or two. Sure not the most space-efficient solution, but - I wonder who cares about a few KB nowadays? Also, for the same reason, I will keep REF_-columns human-readable - these don't have multi-glyphs anyways. Unless, ofcourse, they refer to a radical which is also used as a KANJI - in which case data on which UNICODE(s) belongs to the radical exist in the table allready. Also, the code-point used in those human-friendly columns are chosen per common font-support. All the traditional (214 KangXi) got dual code-points, and usually fonts only support one of these. While digging into this topic I noted issues caused by this as there, logically, should be only one point for same character - while there are two to support potental local/regional variations... although the font-makers doesn't seem to care for this, simply using the table they find first - skipping the other, whichever it may be of the two. Possibly sometimes cross-linking the two in the font...

There are many file name restrictions in Windows, and those were never clearly stated, so if you look for a list of forbidden characters you are out of luck. It truly is junk but I'm not a fan of Linux either, so I don't want to stand on either side. Let's leave this outside of GIT please. Some people who gets triggered by it might come to join in.

Only reason I brought it up was to ensure cross-platform compability, don't want to use "illigal" characters. Also, I was thinking of using symlink for some references as it'd save space and make things more convinient... but pointless without cross-platform support. Either way: Whomever claims there's such a thing as a PERFECT SYSTEM, or filesystem for that matter, doesn't know what they are talking about as ALL - without exception - has flaws and/or limitations of some sort. System, and filesystem, should be choosen per USER NEEDS and NOT per NAME/BRAND. Period.

There's no deformation of the SVG. It's fine if you place the radical at the position and size where it actually should be.

Great, saves me effort.

am2del commented 6 years ago

@ Radical Data / Browser: Opening a separate thead in relation to this and refer to this thread.