mw8 / white_keyboard_layout

An optimized personal keyboard layout and the tools to create your own
MIT License
100 stars 13 forks source link

care to share bigram stats? #4

Open matey-jack opened 7 years ago

matey-jack commented 7 years ago

Hey Michael, I am also running a self-designed keyboard layout with the right hand shifted right. (With the special feature of using the key right of SPACE also as a letter key.) Using it for three years now with only a few letters changed from qwerty to keep XCVZ shortcuts as well as Ctrl+S, Ctrl+W in place and not shuffle letters around that are on decent positions anyway. (ASD, for example, is just alright already, as well as German PÜÄ.)

To my point: I am now thinking of optimizing non-letter characters, too, because many of them moved anyway from far right to middle and also I get a brand new blank key because of the one letter which moved to the thumb.

Obviously you don't want to share all of the text which you typed in the last 5 years, but maybe you would share to bigram stats extracted from those texts? I haven't checked if your optimizer counts these up internally. Even if it doesn't, maybe you could run a counting tool on your corpus and share the result? I think preparing a good corpus is a lot of effort and by sharing just the letter frequencies and bigram-counts you could share that with us.

Thanks by the way for the long intro write-up! Always interesting how people approach this problem of assigning weights and which part of typing they see as painful vs joyful.

all the best, Robert

mw8 commented 7 years ago

Hi Robert,

Using the key to the right of the space bar as a letter key is a very interesting idea. I almost never use that key and it's in a very useful spot. In fact, it reminds me of the RSTHD layout (for the ergodox) that places the E key in one of the thumb groups.

As for my text corpus, it's true that I use large text files containing most of the things I've typed over a 5 year period, but for efficiency I convert these over to a word frequency list and throw away all the words that don't show up more than 20 times over 5 years. So I think I will go ahead and proofread my word frequency list, to make sure it doesn't have any uniquely identifying information, and then I will add it to the repository.

This will allow anyone to extract n-gram stats and also serve the purpose of giving default data to test the code.

Cheers, Michael

matey-jack commented 7 years ago

great idea! I'm looking forward to it!