Closed rgbkrk closed 7 years ago
I'm just playing around with the font now, but did you say you'd contacted a designer to look at this? I've never done font stuff, so I won't try to do it if a professional will be fixing it soon.
Also, I think Randall said he had a version with lowercase lettering as well - maybe we should wait for that before we change anything. It doesn't look like an otf file will be great for merging changes in later. Speaking of which, is there a better font format for version control?
It looks like Fontforge's SFD file format is text based, and thus can be diffed.
Yes, I reached out to some designers. I don't think it's a bad thing to mess with it, especially considering we can always revert back.
Hey! I don't have a version with lowercase glyphs. What I was awkardly referring to was: There is an old and a new version of the font, and one of the distinguishing features was that the old version had no glyphs in the lowercase slots—so if you typed lowercase text, it appeared in a fallback font—but the new version had copies of the uppercase glyphs in those slots. The one used in the April 1st comic, which was previously the only version available, was the old version.
At the risk of some mild nerd sniping, there's a tangentially font-related task that someone here might find interesting.
I've long wanted to come up with a "quick brown fox"-style sentence for digraphs—that is, a string of words that covers as many common English letter pairs as possible. It seems like it would be useful for testing kerning, since it would show the letters matched up with each other as they're most likely to appear.
For example, "unauthoritativeness leatherbark intracolic microcheilia offsider glassweed rottolo albertite hematorrhachis organometallic segregationist unevangelic campstool" is pretty good.
I worked on some algorithms to generate wordlists (by testing coverage against some random corpus) but wasn't able to make the search efficient or clever enough to get a satisfyingly compact wordlist.
Anyway, if you get involved in working on this font, I'd be happy to write out more symbols at some point, and/or also provide higher-resolution raster images of my handwriting.
Hum what about pattern recognition to extract a font from all your comics ? It should be able to extract all letters and pairs and auto-infere the correct kerning right ? As well as alternative glyphs for each character.
I've long wanted to come up with a "quick brown fox"-style sentence for digraphs—that is, a string of words that covers as many common English letter pairs as possible. It seems like it would be useful for testing kerning, since it would show the letters matched up with each other as they're most likely to appear.
Neat!
I worked on some algorithms to generate wordlists (by testing coverage against some random corpus) but wasn't able to make the search efficient or clever enough to get a satisfyingly compact wordlist.
@randallmunroe -- do you have code (or algorithms) posted somewhere for this? I'm sure people would jump in to work on it and make runs (myself included).
Hi all,
I wrote a notebook that calculates some usefull digraph information for debugging this (see the output summary below). The original notebook can be viewed at http://nbviewer.ipython.org/6371184 .
There are 676 different digraphs. There are 109583 words with two or more characters in the English dictionary installed on my machine.
The single english word containing the most digraphs is antidisestablishmentarianism, with a total of 27 digraphs.
The following string of 702 characters contains all 676 digraphs:
aaabacadaeafagahaiajakalamanaoapaqarasatauavawaxayazbbbcbdbebfbgbhbibjbkblbmbnbo
bpbqbrbsbtbubvbwbxbybzcccdcecfcgchcicjckclcmcncocpcqcrcsctcucvcwcxcyczdddedfdgdh
didjdkdldmdndodpdqdrdsdtdudvdwdxdydzeeefegeheiejekelemeneoepeqereseteuevewexeyez
fffgfhfifjfkflfmfnfofpfqfrfsftfufvfwfxfyfzggghgigjgkglgmgngogpgqgrgsgtgugvgwgxgy
gzhhhihjhkhlhmhnhohphqhrhshthuhvhwhxhyhziiijikiliminioipiqirisitiuiviwixiyizjjjk
jljmjnjojpjqjrjsjtjujvjwjxjyjzkkklkmknkokpkqkrksktkukvkwkxkykzlllmlnlolplqlrlslt
lulvlwlxlylzmmmnmompmqmrmsmtmumvmwmxmymznnnonpnqnrnsntnunvnwnxnynzooopoqorosotou
ovowoxoyozpppqprpsptpupvpwpxpypzqqqrqsqtquqvqwqxqyqzrrrsrtrurvrwrxryrzssstsusvsw
sxsysztttutvtwtxtytzuuuvuwuxuyuzvvvwvxvyvzwwwxwywzxxxyxzyyyzza
There are 25 extra digraphs in the above string. Splitting at those positions yields these words (of no language):
aa abacadaeafagahaiajakalamanaoapaqarasatauavawaxayazbb
bcbdbebfbgbhbibjbkblbmbnbobpbqbrbsbtbubvbwbxbybzcc
cdcecfcgchcicjckclcmcncocpcqcrcsctcucvcwcxcyczdd
dedfdgdhdidjdkdldmdndodpdqdrdsdtdudvdwdxdydzee
efegeheiejekelemeneoepeqereseteuevewexeyezff
fgfhfifjfkflfmfnfofpfqfrfsftfufvfwfxfyfzgg
ghgigjgkglgmgngogpgqgrgsgtgugvgwgxgygzhh hihjhkhlhmhnhohphqhrhshthuhvhwhxhyhzii
ijikiliminioipiqirisitiuiviwixiyizjj jkjljmjnjojpjqjrjsjtjujvjwjxjyjzkk
klkmknkokpkqkrksktkukvkwkxkykzll lmlnlolplqlrlsltlulvlwlxlylzmm
mnmompmqmrmsmtmumvmwmxmymznn nonpnqnrnsntnunvnwnxnynzoo opoqorosotouovowoxoyozpp
pqprpsptpupvpwpxpypzqq qrqsqtquqvqwqxqyqzrr rsrtrurvrwrxryrzss stsusvswsxsysztt
tutvtwtxtytzuu uvuwuxuyuzvv vwvxvyvzww wxwywzxx xyxzyy yzza
There are 83 digraphs missing from the installed English dictionary:
bq, bz, cf, cj, cv, cx, fq, fv, fx, fz, gq, gv, gx, hx, hz, jb, jd, jf, jg, jh,
jl, jm, jp, jq, jr, js, jt, jv, jw, jx, jy, jz, kq, kx, kz, mx, mz, pq, pv, px,
qb, qc, qd, qf, qg, qh, qj, qk, ql, qm, qn, qp, qq, qv, qw, qx, qy, qz, sx, tq,
vb, vf, vh, vj, vk, vm, vp, vq, vw, vx, wq, wv, wx, xd, xj, xk, xr, xz, yq, yy,
zf, zr, zx
The following string of English words contains every digraph in the English dictionary:
spectrometric brilliantly bilious xerography furrieries imbued disparages
federating prejudgment welched addressees survival disciplining wheedlers
moussaka crockery ferrules unready pylori mantlings heartbroken pollywog rasping
unequivocally vermont arointed whereby hurriers supervisal humble vocalizers
client focusers casbah sullenly entwist tabling wops bereave daguerreotypes
trawleys taipei suttees newton cesiums poteens calypsos extol beautifying
blowback similarities corniche wardens cowhand slouchingly deponents monotheists
evertor roentgenogram argonauts totipotency percussional balmiest hemorrhoidal
designers umbels eroded axels surgical bhangs freshens temporally nighs punkier
fluoride cirrus mazer hypnic dualities brutishness refastens finitudes
eluviating stickums zanies bobbinets dolce overgrazes slipways bandmasters
rechristens subclassified barfly swervers skidooing jaggeder disarrayed catchall
candlesticks obligate overblown plummiest overblows algicides snoopiest
illegitimating nonbeings emulous divisibleness hautboys kennelled ragtag aorta
postscript xiii foamily nimrods uglify vaultier schmaltziest jodhpur infielder
authoritarianism biorhythmicities skinny chairperson banquets gonococcic
bunkhouse silverware pachyderm wholeheartedness murkest flyby puppyish subahdars
ophthalmologist dismalness gyving standpat kludge alfa gonofs neuropsychiatry
rondeaux pigeonholing alertness knickknacks quixotes sleepwalkers workwoman
barbwire puffer wraths fbi yoni ibm hobnailed syntactically shaftings gymnosperm
exclude snapdragon peerless songbook heterosexuals endemics barnyard thumbprint
wreckful hardcovers maxwell hallways reconveys paroquets advantaged nonmalicious
methaqualone nonprofit outdistanced overexpands reinvestment fleshly carboxyl
fishhooks mawkishness nuthouse shrikes snowdrop unneighborly circumventions
telekinesis clipboards muzzling overjoy tramlines midwived aery reacquired
czechoslovakian outpayment rubaiyat skoaled squatted jujitsu bagpiper
copyrighting oxygenated folktales skycap thoroughgoing bazaar benzol albinism
chevrons unlawfulness pooh thoughtful handball unsubtly mezuzahs kishkas
substantialness exactest disjointedly reject comfortless liquefies dozenth
dogwatch disdaining heavy subjectiveness cajaputs dishwasher picnicking
spoonsful swimmy handfasted wallpapered songfully upchucks ballyhoo chowchow
vulnerable unadjudicated obviously thewy leafhoppers subfloors healthful
foolhardier jimsonweed cowgirl zemstvos sackcloth abductors israelis beachcomber
evzone weltanschauung blackbird oxgall hardihood uncircumcised folkmotes boxlike
cowpoke blancmanges fizgig jnana cavalryman kvetch auk gulfweed fjord dogcart
clerkdom exhorter blowzy ebcdic campfires fellowmen hafniums bootjacks hijacker
iraqi jellyfish dojo marquess asphyxy thanksgivings knackwursts scherzos xxv
hundredth yules thruways blackguards injector blowjob inkpots dummkopfs qaid
bowwow foxskin taiwanese exfoliate dykes overbuy handkerchief zigzags
pocketknives dogdom highjacked syzygial muckrakers dumdums farmhouses skyjackers
skivvies huh propjet gazpacho gizmo bumpkinish escarpment jinxing filmgoer
subkingdoms boomtown afghanis xmas stockjobber cpi halfpenny offcut blvd
pavlovian gingko complexness engulfment transverse ramjets mpg gumwood
exquisiteness subgenera oxbow serfdom grosz calxes tuque cwt hindquarters
pirozhki schmalz zwieback blitzkrieged mezcals cumquats gadzooks hajj czarevnas
muzjiks mitzvahs marxists sacbuts qts qophs kafka gjetost mezquites cgs whizbang
aztecan avg earthquakes dostoevsky tx pbx pulque qed hdqrs dx vt killjoys vc zn
leipzig iqs nashville samizdat nietzsche jct pirojki
I forgot to mention, if you want to generate a different set of English words that span all of the English digraphs, modify the number 4037 of the following line in the last cell of the notebook.
print(wrap_paragraphs(" ".join(get_words(4037)), 80)[0])
Here is a link to my notebook
Do we want to include these notebooks in this repo. We could use them to create standalone .txt
files with the text.
@ellisonbg Including the notebooks sounds like a great idea!
@jdfreder Want to make a PR?
@jdfreder My code was nothing special; it just took a set of words, measured their digraph coverage, and then altered the list by a series of changes, additions, or deletions, then compared the new set against the old. It had no way to escape local maxima and the evaluation function was a huge kludge of arbitrary paramters. It'd definitely be better to write something from scratch than try to work with my code :)
It came up with lists like this:
proboscidiferous thinning duodenopancreatectomy madrigaletto turnsheet introversion pseudobulbar quakerbird tambookie salpingopharyngeal rammelsbergite sowens accumulativ reviewage cephalohumeralis uncollegiate unapperceived foxhound plenipotentiaryship reaffirm unlimitedness spearwort bluejack daystreak autodestruction siliceofelspathic arrival aerocartograph clairvoyantly purchasable surdity hydrohemothorax
and this:
perpendicularity anthecological semivitrified radiosurgery scarabaeiform necessarily ultrainclusive safeblowing dactyliography tautophony nonmarriage tiddley probituminous threadway castigative afreet unworshipping jaragua slumpproof weelfard adsbud microcentrosome yestermorn uppluck championship plenipotence goodheartedness doff belight dryworker bradyteleokinesis jovialty narcissistic regatta kahuna abouts semirattlesnake rescrub millennialist quinquenary epispastic simulation deceivability
which manage pretty good digraph coverage but definitely have a lot of room for improvement. I'd love to see someone else throw some more sophisticated algorithms at it.
Since @pelson went whole hog on this, I think we're in a good state.
As mentioned by @randallmunroe in an IPython pull request comment, the kerning is off on some of the letters.