openstenoproject / plover

Open source stenotype engine
http://opensteno.org/plover
GNU General Public License v2.0
2.36k stars 278 forks source link

Japanese on Plover #625

Closed Scoubines closed 1 year ago

Scoubines commented 8 years ago

Hi everyone,

I am a French living in Japan and now I am currently thinking of using Plover for Japanese chord writing. It is a unique writing method developed from 1995 and only used for Japanese stenoword_ TV broadcast real-time substitute editing companies. After a long long research, I managed to get a hand on their system called StenoWord (which is now out of patent), which consist of a remap of 10 keys of the keyboard (5 Japanese consonants and 5 Japanese vowels) + a list of 20,000 combinations to form the most basics chords. I would like to point out that both parameters are necessary to use it. They also use a special dedicated keyboard (made by Canon, but after several tentatives they don't want to sell it all..), that can be easily reproduced with any N-key rollover keyboards. So the system allows to write at up to 5 characters/seconds (not words) to target a 330 character/min (which is damn fast when knowing that Japanese speak at about 300 characters /min) using as you guessed multi-strikes per time, just like the english version. Some basic chords of Stenoworlds 2.pdf Some other more complex chords.pdf

Here is a demo in Japanese: Japanese real-time captioning for live television.

I managed to have it work with the equivalent of Plover on Windows (you can check that maybe??), with the Japanese program called DvorakJ. It's all in Japanese but tell me if you want to see how to load layout files into it to test it. It perfectly imported it and activated the keywords chords when writing, but no Mac version at all and kind of tiresome...

So here is a the excel file that contains the data base that would need to be imported into Plover. The database itself is only the 1 first column of the Config tab (the rest being links to make changes in the dictionary if required) For reasons I don't know yet, each chord line is written with these weirds |||||||||| signs, that might need to be removed (I hope by some automations...) to have something clean for Plover (however these lines were rightly read of the app DvorakJ so I guess the guy who compiled the data base from the code had a good reason to put them). I need to say that I have absolutely no skills in coding or html so I am not sure how that could work but I will be willing to collaborate with someone to succeed that. However I speak and write Japanese fluently now so I can be a good hand for the linguistic part.

Please tell me if you need any more information to make this happen! Meanwhile I can't use at all Plover on my Mac currently (I posted an issue about that) due to a bug that other seem to be experiencing.

Arigato! Sam Stenoword_database.xls.zip

p2635 commented 8 years ago

I've seen that video ages ago and was wondering about this! I'm learning Japanese so I'm going to follow this post :)

benoit-pierre commented 8 years ago

Note: I took the liberty of fixing some of the links in the original description.

Scoubines commented 8 years ago

Thank you Benoit!

I just had a hand on the Plover dictionary editor parameters and technically speaking it seems we can insert any personal dictionary, good start! 2 issues I face now: Japanese characters doesn’t seem to be recognised and can’t possibly add 1 by 1 all the dictionary lines (it would take ages)…Some direct import feature from a .txt or excel file would be the best from my beginner perspective

benoit-pierre commented 8 years ago

It looks to me like the Stenoworlds PDF is easier to make sense of. You should come to Plover's Discord server so we can discuss how supporting this in Plover could be done.

nexaitch commented 7 years ago

There seems to be a theory for a more traditional steno layout here but it doesn't seem to be welldocumented and I'll see if I can do a translation of either theory

na4zagin3 commented 7 years ago

I am just interested in stenography and I am glad if Plover would support Japanese stenography. The layout described in http://caneco.my.coocan.jp/001c/index.html which @nexaitch mentioned is called “Sokutaipu,” which is (was) used in courts. I am searching other documents, but it is difficult to find them online because the school for court stenographers has been closed for 20 years.

Sokutaipu is based on phonological orthography, i.e. without kanji, because the originator was a strong supporter of romanizing Japanese; so it is not useful to directly input texts in the current orthography which uses hiragana, katakana, and kanji. It seems that professional stenographers use Hayatokun now, a computer-aided transcription software. By the way, although Saeki (1970) proposed a new method processing kanji with Sokutaipu, I don’t know if anyone is using his system now.

The key layout is like this:

              left <-|   |-> right

left/right   Y H S I O   O I S H Y
             Y T K A O   O A K T Y
middle             T K * I N

There are three groups: left, right, and middle. Y and O are a two-row key as same as S in stenotype. It has five keys in the middle group. Does Plover support this?

Basically each of left, middle (only K ku, KI ki, T tsu, TI chi, I i and N n in Chinese-origin words, I guess), right groups represents one syllable. For example,

Y   S  O  K        KH Y

(hereinafter, I use <left>|<middle>|<right> notation. This example become YSO|K|KHY) has YSO sho in left, K ku in middle, and YHK gyō in right; therefore it represents shokugyō “職業” ‘occupation’. KA|N|OATY represents kantē but it may be “官邸” ‘official residence’, “艦艇” ‘naval vessel’, “鑑定” ‘to appraise’, “桓帝” ‘Emperor Huan’, and so on.

There are special codes like |T| ni (used only for postposition ni“に” ‘in’), YHKA|TK and IN|AKHY kangae(ru) “考える” ‘to think’, and so on. Most of one-handed codes, e.g. YHKA|TK and IN|AKHY kangae(ru) “考える” ‘to think’ and YTHKS|TK and IN|SKHTY (koto ga) deki(ru) “ことができる” ‘to be able to (do)’, can be combined another syllable or one-handed code. For example, YHKA|TKIN|SKHTY represents kangaeru koto ga deki(ru) “考えることができる” ‘to be able to think’. And I think O|IN|AKHY represents o-kangae “お考え” ‘thought’. |T| ni (postposition), YIO|TK|SKHTY to iu fū na “というふうな” ‘something like’ and so on are two-handed codes.

There are other difficulties to implement Sokutaipu. Some codes have contextual ambiguity. Tejima (1995) have a few examples: YKI|TK (kita independent, tekita after a verb, shitekita after a noun; ‘to come’ or a inchoative aspect auxiliary verb) YKI (kyu [this syllable appears in only borrowing words] or ikenai ‘is bad’ used after postposition wa) and YTO (cho in a noun or tosuru ‘to assume’ after a noun).

I am sorry for posting so long text. I would be glad if I could help you.

morinted commented 6 years ago

I'm concerned that without the word processor made by the company, we will not be able to get far. This all seems very proprietary.

Here's what I've discovered so far:

PTHKS UAOIE
   LM NR

Steno Order
===========

American-style order:
PTHKS LM NR UAOIE
Thumbs-first (used in the spreadsheet):
LM NR PTHKS UAOIE

I think Thumbs-first is better since LMNR all act as "shifts" in a way.

Hiragana cheatsheet:

Basic, without combination
==========================

A → あ
KA → か
SA  → さ
TA → た
HA → は
PA → ぱ

Combinations
============
LH=N, LK=M, LS=Y, LT=R, LP=W

LHA → な
LKA → ま
LSA → や
LTA → ら
LPA → わ

Use P to add sounding, e.g.
PKA → が
Applies to PK → G, PS → Z, PT → D, PH → B

Little symbols (小書き文字):
PHKA → ぁ
LA → ゃ
LPTKA → ヵ (Katakana)
LPU → っ (wu is little tsu)

Some questions/concerns:

Finally, the StenoWord database is actually very small. It only contains 2648 entries, which I've uploaded in JSON format here: https://gist.github.com/morinted/3e75604ee17917420933b03e9bfa1cab

I'm curious what is the source of the StenoWord database? Who created it? Why?


I see, also, that Speed WP performs classes for the skill, still: https://www.speed-wp.co.jp/

It would be really cool to have someone who's gone through the class direct our development in the right direction.


ステノワード is the search term if you are looking for anything about the company or its theory books.

morinted commented 6 years ago

@na4zagin3 do you think StenoWord or Sokutaipu would be better to pursue? To me, it seems like you end up at the same place: you can write phonetically, and someone else has to worry about the kanji conversion.

I'd be curious to know if there are realtime writers using Sokutaipu still, as well, instead of just StenoWord which seems to own Japanese captioning.

na4zagin3 commented 6 years ago

Although I know little about StenoWord, it sounds that StenoWord might be the easier way to provide one-manned kanji conversion in phonetic way with steno layout. In the easiest way, Plover can behave just a sophisticated converter from steno strokes to key sequences and let user’s IME work about kanji conversion. Otherwise, Plover can be a real IME to provide kanji conversion with some kanji conversion engine. The only problem is that it is difficult to get information because StenoWord is proprietary. Although the community must be active (there are five schools and one of them is opened in 2015!), there is little information on the Internet.

On the other hand, no methods are known to provide kanji conversion for Sokutaipu, except for a two-manned system called Hayato-kun and a direct method (user can directly write kanji as if they are briefs). It looks like that there is a company that still uses Sokutaipu and is running a course. (BTW, I am making a dictionary file of basic strokes of Sokutaipu in my free time. Once it is ready, I will file a ticket.)

Kanji conversion of StenoWord

The patent about the basic of StenoWord is 特開昭63-010220 and now public. It have information about inputting kanji. It says “……入力された文字に漢字を含む場合には句読点入力時点での同時変換による連文節変換または全文一括変換法等によって、印字装置3へ出力するように演算記憶装置4が構成されたものが使用される。” That means, when user input any punctuation, it converts a composition string to composed text with kanji, using Ren-bunsetsu-henkan (multiple word conversion) method or something like that. Here, Ren-bunsetsu-henkan is contrasted to Tan-kanji-henkan (single kanji conversion) which uses simple dictionary look up like SKK. Meanwhile, Ren-bunsetsu-henkan requires more sophisticated algorithms like Anthy, Mozc, nksnd, and so on do.

According some resources, practically, StenoWord uses two-manned system: one writes down phonetically and the other corrects kanji conversion.

IMO, In order for Plover to support kanji conversions, there are two ways. a) To embed a kanji conversion engine in Plover and to have Plover behave as an IME. b) It can rely on user’s IME. Plover only converts a key stroke to a sequence of hiragana/katakana or a special key sequence to control IME (e.g., ”Shift + left arrow” to shorten a current word)

M key

M is never used in the spreadsheet. What is it used for?

I am not sure about it, but M must be used in shorthands. The video shows HKMIE or HKMRIE is assigned for “こみゅにけーしょん”, which means the data is incomplete.

And M might be used for Latin alphabets although this is my guess. L looks like a shift (up) key, M might be a shift (right) key.

morinted commented 6 years ago

It seems like kanji is the biggest issue with these systems, because if you want to write alone you'll have to take breaks to solve kanji ambiguity issues, which seems tedious.

I've quickly mocked up a dictionary that lets me use the StenoWord dictionary on my American steno machine, and it seems to work, but it only lets me write hiragana and Katakana.

I tried turning on my Mac's IME and I had some interesting experiences:

I wonder if StenoWord could be adapted to support kanji as briefs, as well, at least for personal use where you might have a more limited vocabulary than a captioner.


Sokutaipu's layout is interesting. I've been looking at it, and I noticed that it has too many keys to fit on the American steno layout, since the O keys are distinct, and there's a center thumb key. The Tréal TR steno machine could handle the layout, but I'm not aware of others.

Are there still Sokutaipu-compatible machines as far as you can tell? It is something I could implement if I had more information on the machines that interface with software…unless you are just planning to use a keyboard, which is fine, too.


I'm considering just reusing the English steno layout for Japanese, now. I know it's not one-to-one, because English is not at all like Japanese, but I imagine that for the purpose of writing phonetically, but also not having any IME involved in kanji input, it might actually be a good candidate.

The vowels could largely be left as they are, there's a natural distinction in the layout between a, o, e, u, i, ai, oo, ou. There's an obvious way to add syllabic N to any single syllable, there's lots of potential. Not to mention that with the right brief, you could map, say, NAEN (close to "nanny" in the English theory) to 何.

You could take rules that exist in Japanese phonetics and combine them with the Ireland layout. You would only get 1 or 2 syllables out at the time by default (comparable to StenoWord, not like the 2-3 you get with Sokutaipu) but with briefs you would potentially be able to get out entire phrases and honorifics, as well as leaving lots of chord room for direct kanji conversation.

The largest caveat with this method is that I don't speak Japanese and probably if I were to try and create a layout for a language I don't know, I would fail (or at least go through many iterations).

The other thing I'm wondering: do native speakers know their kanji well enough to be able to think in terms of kanji-briefs, or is a lot of the time when inputting with an IME spent on recognition, and reading the definitions of kanji?

na4zagin3 commented 6 years ago

Plover could only activate the IME when it was outputting the Latin alphabet. If Plover was sending actual Japanese characters, the IME didn't do anything. This kind of sucks, because it doesn't let the writer use Katakana as easily as the StenoWord system should.

Ah, usually IME does not assume it directly receives hiragana or katakana. To input katakana, distinguishing from hiragana, we can set an option about the behavior of Shift key. To let IME process input consisting of hiragana, katakana, and kanji, we can use Google 日本語入力 which allows flexible romaji conversion table. For example we can add entries like “&#x6F22;” → “漢” as well as “a” → “あ”.


Are there still Sokutaipu-compatible machines as far as you can tell? It is something I could implement if I had more information on the machines that interface with software…unless you are just planning to use a keyboard, which is fine, too.

Stenograph L.L.C. made (or, are making) machines with Sokutaipu layout (http://steno2000.ehoh.net/hoshiki04.html). I haven’t seen yet though. I am trying to contact WardWarp K.K.

Now I am learning English stenotype by myself with Tréal TR. As you said, Tréal TR looks the only machine that can support Sokutaipu and I can easily buy, except for keyboards.


Applying English layout to Japanese sounds interesting. I found a system called Caver, which uses English stenotype machine, FYI. I have also an idea of a layout of kanji based on Middle Chinese pronunciation system.

BTW, To try a new layout, can I start to create a new file under system/, copying english_stenotype.py?


The other thing I'm wondering: do native speakers know their kanji well enough to be able to think in terms of kanji-briefs, or is a lot of the time when inputting with an IME spent on recognition, and reading the definitions of kanji?

It depends. a) Most people use phonetic conversion with multiple word conversion (連文節変換) and typically hit space one to four times to start conversion and select the correct candidate after inputing a few words. Except for inputting names or unusual words, we don’t read the definition. This video demonstrates typing Japanese in slow speed.

b) There are still users of direct kanji input methods (漢字直接入力 or 漢直), although it is now less popular than single kanji conversion method (単漢字変換) like SKK. I and some of my friends use TUT-Code and another of them uses T-Code. These systems provide a special sequence to compose a kanji from two kanji (e.g., in TUT-Code, stroke ala/.k; is converted to 拭: ala is the composing prefix, /. means 才, and k; means 式. 拭 has stroke ;na though), which reduces the number of kanji strokes we have to remember.

I think Sokutaipu with a direct kanji input method, proposed by Saeki (1970), is doable too.

morinted commented 6 years ago

I'm on my phone so I will be brief, but I really appreciate the research you've done.

I would recommend cloning a system plugin, like morinted/plover_russian_trillo, modifying it to your liking, then using ./setup.py develop, which will install the plugin for when you run Plover with ./launch.sh

If you'd like, I can do an initial implementation of Sokutaipu in a system to get you started, in case you run into trouble with the above.

Is Caver viable for kanji too? I will research it a little more later.

Do you agree that ideally the writer wouldn't have to stop for a conversion step? Changing word boundaries seems especially tedious to me 😃

On Dec 9, 2017 10:24 AM, "Sakamoto Noriaki" notifications@github.com wrote:

Plover could only activate the IME when it was outputting the Latin alphabet. If Plover was sending actual Japanese characters, the IME didn't do anything. This kind of sucks, because it doesn't let the writer use Katakana as easily as the StenoWord system should.

Ah, usually IME does not assume it directly receives hiragana or katakana. To input katakana, distinguishing from hiragana, we can set an option about the behavior of Shift key. To let IME process input consisting of hiragana, katakana, and kanji, we can use Google 日本語入力 which allows flexible romaji conversion table. For example we can add entries like “漢” → “漢” as well as “a” → “あ”.

Are there still Sokutaipu-compatible machines as far as you can tell? It is something I could implement if I had more information on the machines that interface with software…unless you are just planning to use a keyboard, which is fine, too.

Stenograph L.L.C. made (or, are making) machines with Sokutaipu layout ( http://steno2000.ehoh.net/hoshiki04.html). I haven’t seen yet though. I am trying to contact WardWarp K.K.

Now I am learning English stenotype by myself with Tréal TR. As you said, Tréal TR looks the only machine that can support Sokutaipu and I can easily buy, except for keyboards.

Applying English layout to Japanese sounds interesting. I found a system called Caver http://ci.nii.ac.jp/naid/110003278046, which uses English stenotype machine, FYI. I have also an idea of a layout of kanji based on Middle Chinese pronunciation system.

BTW, To try a new layout, can I start to create a new file under system/, copying english_stenotype.py?

The other thing I'm wondering: do native speakers know their kanji well enough to be able to think in terms of kanji-briefs, or is a lot of the time when inputting with an IME spent on recognition, and reading the definitions of kanji?

It depends. a) Most people use phonetic conversion with multiple word conversion (連文節変換) and typically hit space one to four times to start conversion and select the correct candidate after inputing a few words. Except for inputting names or unusual words, we don’t read the definition. This video https://youtu.be/svz9di77uiI?t=45s demonstrates typing Japanese in slow speed.

b) There are still users of direct kanji input methods (漢字直接入力 or 漢直), although it is now less popular than single kanji conversion method (単漢字変換) like SKK. I and some of my friends use TUT-Code and another of them uses T-Code. These systems provide a special sequence to compose a kanji from two kanji (e.g., in TUT-Code, stroke ala/.k; is converted to 拭: ala is the composing prefix, /. means 才, and k; means 式. 拭 has stroke ;na though), which reduces the number of kanji strokes we have to remember.

I think Sokutaipu with a direct kanji input method, proposed by Saeki (1970), is doable too.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/openstenoproject/plover/issues/625#issuecomment-350478137, or mute the thread https://github.com/notifications/unsubscribe-auth/AFkgSiQexFQ6XutvFhCbhfHu8bT-FM5dks5s-qZFgaJpZM4Kdabc .

na4zagin3 commented 6 years ago

Thank you for suggestions!

I would recommend cloning a system plugin, like morinted/plover_russian_trillo, modifying it to your liking, then using ./setup.py develop, which will install the plugin for when you run Plover with ./launch.sh

If you'd like, I can do an initial implementation of Sokutaipu in a system to get you started, in case you run into trouble with the above.

I will try it this weekend. I already have set up python environment and succeeded to run master.

Is Caver viable for kanji too? I will research it a little more later.

I found the system overview, without briefs, is described here: http://steno.web.fc2.com/index.html (In Japanese) Although kanji conversion from kana is also required, it has systematized strokes for briefs including nouns and verbs. So it allows kanji briefs.

Do you agree that ideally the writer wouldn't have to stop for a conversion step? Changing word boundaries seems especially tedious to me

You’re right. Changing word boundaries is the most frustrating time in Japanese typing. Most people with good typing speed tend to hit convert key just after typing a word written in kanji. To repeat hitting conversion key to choose the correct candidate is also time consuming but it is shorter than correcting word boundaries.

na2hiro commented 6 years ago

Hi, I'm also interested in Japanese steno. I connected the JSON for StenoWord and made it a plugin. It doesn't do anything about Kanjis yet, and I'm currently thinking about letting IME to deal with conversion part (created an issue there). https://github.com/na2hiro/plover-japanese-stenoword

@morinted I based from your Russian plugin as you recommend. Could you advise me how to publish it as a plugin of Plover? I'm a software developer but I don't have enough knowledge of python ecosystem.

olejika commented 3 years ago

I perfectly see the steno-girl from the video input ろしあのぷーだいは. But how it became ロシアのプーチン大統領は、? Seems that secret of speed is not only knowing the combination of keys but the abbreviations of the words

SpongebobSquamirez commented 2 years ago

What's the status of this?

greenwyrt commented 1 year ago

See the wiki for existing Japanese steno systems.

della-rosa commented 9 months ago

In case anyone is interested in using this system, I cobbled together a quick and dirty cheat sheet for myself that shows the basic patterns I could figure out from the above pdf Some other more complex chords.pdf . Be warned it is not thorough and probably (likely) contains errors and a lot of the latter half of the list is a bit misrepresentative because it seems like at some point there's less patterns and mostly briefs, but if you want to try to use the system with Plover this might be better than starting from absolute zero.

Combined Cheat-Sheet