michaeldickens / Typing

https://github.com/michaeldickens/Typing
69 stars 18 forks source link

Licence #19

Closed iandoug closed 8 years ago

iandoug commented 8 years ago

hi

Can you please provide a licence file?

I am contemplating if it would be feasible to add this functionality to the Keyboard Layout Editor, but it would need to be rewritten in JavaScript and be able to dynamically adapt itself to a given physical layout.. http://www.keyboard-layout-editor.com/ https://github.com/ijprest/keyboard-layout-editor

In general I'm looking for a way to evaluate one layout over another to see which is better. For my own design (Programmers Keyboard, in KLE above, design has moved on a bit since that version) I picked Workman-P but XahLee, http://xahlee.info/kbd/keyboarding.html , who knows much more about design than me, has some issues with it.

Thanks, Ian

HughP commented 8 years ago

@iandoug If you are talking about rewriting this code, please be aware that the current code does not support unicode. It only supports ASCII. If you rewrite the code in JavaScript... Which it currently takes a bit of time to process, so I imagine that it will take even longer in JavaScript, please make it Unicode compatible. I have been using the current code to work with minority languages in Africa and creating keyboard layouts for them. However, I have to sub-in characters all the time to do the processing. It would be a lot easier to use actual unicode values.

iandoug commented 8 years ago

@HughP Re the JavaScript, I mentioned that because the keyboard-layout-editor is a single page web app that does everything in JavaScript. Personally I hate JavaScript. I also figured that this function would be computationally intensive and thus JS would be a poor choice. Ideally it should run on the server in a better language (compiled), and send the result back to the browser. However I don't think the current KLE would like that ... at the moment it's very cross-platform and stand-alone, adding a server component puts it in a different category. I was thinking of doing an extension to KLE on a different site, and including a gallery of user designs, as well as links/integrating with Swill's case builder http://builder.swillkb.com/ . Having a layout analysis component would fit with this. KLE has different types of users ... some like me use it to design from scratch, others just use it to rearrange ANSI/ISO layouts with pretty colours etc. My own design is heavily Unicode-based so yeah I would use Unicode rather than ASCII :-)

HughP commented 8 years ago

@iandoug FWI I have been following your work for at least a year now too. You do good work.

michaeldickens commented 8 years ago

I added an MIT license in doc/LICENSE.

iandoug commented 8 years ago

@michaeldickens thanks for the generous licence :-) @HughP Thanks, are you sure you are not confusing me with Ian J Prest, who wrote the Keyboard Layout Editor? I'm just a contributor to it. Also, do you have some sort of body of books / whatever to analyse for your layouts?

michaeldickens commented 8 years ago

@iandoug, here I have characters and digraphs from a large corpus and here I have separate characters and digraphs for different types of text in case you want to weight them differently. I can't publicly release the corpus since it contains a lot of private information (e.g., all my emails from 2006-2009).

You could use something like the Corpus of Contemporary American English, which is much bigger, although I believe you see diminishing returns for corpora that big. Giant corpora are mostly only useful for word/phrase frequency. (Google Ngrams is the best source for that, since AFAIK it's the largest corpus ever. Even just the frequency data is 30GB.)

HughP commented 8 years ago

@iandoug I guess I did mistake you with IJP. To answer your question though, I have been downloading and cleaning up wikipedia dump files. For much smaller languages I have been using the book of James from the Christian New Testament. Though I suspect that James alone is not a large enough corpus as even in English some character/case combinations do not appear. For instance there is no upper case "Q". When I do analysis I am often looking at press-and-hold combinations like QWETY+ shift and combo combinations like OS X's default location for "é". This means that there are a lot more modifier keys. If we only count "characters" (in the orthographic sense, not in the unicode sense) then we are missing the actual key touches required. For English this is less of a problem because QWERTY and the size of the keyboard fits English quite well. However, for other languages this is not the case, there is quite a bit more cramping.

iandoug commented 8 years ago

@HughP : Linux "composing" key is much more elegant (it seems) than Mac or Windows solutions. The downside is that for frequent usage it is a few more keystrokes on those characters. Eg for é I type "compose key" (user-settable) then e then '. The alternative is to try to put every possible combination of the vowels, c and n with combinations of acute, grave, circumflex, umlaut, cedilla, caron and tilde, and you quickly run of keys and sensible places to do it. Thus Linux's elegant solution to do all those and more (ß ¢ © etc...) with little extra effort. But modelling that into a keyboard layout analyser is another story I guess.

Have you considered the Unifon route at all? It takes a different approach, but may require custom characters for the languages you are interested in. I actually had the English glyphs on an earlier version of my keyboard. There is a font that supports the English glyphs. https://en.wikipedia.org/wiki/Unifon http://www.unifon.org/ https://en.wikipedia.org/wiki/Everson_Mono (Michael is very involved in Unicode).

iandoug commented 8 years ago

@HughP : I stumbled across this site: http://corpora2.informatik.uni-leipzig.de/download.html Fill in their captcha and they'll let you access a large corpus in different languages (txt and mysql). Perhaps there is one that is close to what you need. I only recognise Xhosa and Zulu as being African, but I'm not up to speed on the three-letter languages codes. AFAIK they doesn't use diacritics (well, I've never seen them in Xhosa).

iandoug commented 8 years ago

@HughP Stumbled across this idea for doing most of the diacritics. Interesting approach, but will need enhancements for handling rings and carons and tildes etc. I shall ponder upon this.. :-) http://marin.jb.free.fr/qwerty-fr/

HughP commented 8 years ago

@iandoug Thanks for these links. I did not know about the Leipzig corpora. Looking at the sources for their corpora some of the content is Bible Translation Materials. It is also not immediately clear how much orthographic regularization has occurred.

Funny thing is I am a bit more familiar with the three letter code than the average Joe... I do a lot with ISO 639-3 in my work. I usually use this website to look up the ones I don't know: http://www-01.sil.org/iso639-3/codes.asp but you might notice that some were three letter codes with dash and a following code these are localization codes according to BCP 47 more here too, but the best tool for selecting localization tags I have found has been by Richard Ishida here.

In a way I think we should be adding localization tags to our optimized keyboard layouts so that people know which language(s) they were optimized for. I have some code and thoughts on keyboards in a very scattered order here https://github.com/HughP/MLKA-Bash-data and https://github.com/HughP/MLKA

If you feel comfortable feel free to open an issue in https://github.com/HughP/MLKA if you want to talk further about accessing diacritics from the keyboard. I am not sure it is best to keep adding to this thread as the issue is closed. And it is not directly related to @michaeldickens software.

However, I should note that because his software is only processing ASCII that it does not approach the issue of counting diacritics. That is are they digraphs, or not? And just when you think you have come to an answer about this then also realize that different languages treat them differently when they teach them to students. That is in some languages, didactics are separate letters from the base characters they converge over and represent two ideas, and in other languages the diacritic and the base character form a single letter, or idea in the language use's mind.

Roman scripts are not the only ones with diacritics either; Arabic and Indic scripts use diacritics.