Closed 1ec5 closed 2 months ago
@1ec5 Thanks! I'll make a little script to add these in so we don't have to do it by hand.
Do you happen to know how much performance cost there is for supporting a low-coverage language? I'm inclined to support as many as we can, no matter how few speakers, assuming that there's no significant penalty. People will be more likely to add language data if they can see it on a map.
Here's the script if anyone is interested.
let langs = "ab,ace,af,als,am,an,ar,arz,as,ast,az,az-Arab,az-cyr,azb,ba,bar,bat-smg,be,be-tarask,ber,bg,bm,bn,bo,bpy,br,bs,bxr,ca,cdo,ce,ceb,cho,chr,chy,ckb,co,cr,crh,crh-cyr,crk,cs,csb,cv,cy,da,dak,de,dsb,dv,dz,ee,egl,el,en,eo,es,et,eu,fa,fi,fil,fit,fo,fr,frr,full,fur,fy,ga,gag,gan,gcf,gd,gl,gn,gr,grc,gsw,gu,gv,ha,hak,hak-HJ,haw,he,hi,hif,hr,hsb,ht,hu,hur,hy,ia,id,ie,ilo,int,io,is,it,iu,ja,ja_kana,ja_rm,ja-Hira,ja-Latn,jv,ka,kab,kbd,ki,kk,kk-Arab,kl,km,kn,ko,ko-Hani,ko-Latn,krc,krl,ks,ku,kv,kw,ky,la,lb,left,lez,li,lij,lld,lmo,ln,lo,lrc,lt,lv,lzh,md,mdf,mez,mg,mhr,mi,mia,mk,ml,mn,mo,moh,mr,mrj,ms,ms-Arab,mt,mwl,my,myv,mzn,nah,nan,nan-HJ,nan-POJ,nan-TL,nds,ne,nl,nn,no,nov,nv,oc,oj,old,or,os,ota,pa,pam,pcd,pfl,pl,pms,pnb,pot,ps,pt,pt-BR,pt-PT,qu,right,rm,ro,ru,rue,rw,sah,sat,sc,scn,sco,sd,se,sh,si,sju,sk,sl,sma,smj,so,sq,sr,sr-Latn,su,sv,sw,syc,szl,ta,te,TEC,tg,th,th-Latn,ti,tk,tl,tr,tt,tt-lat,udm,ug,uk,ur,uz,uz-Arab,uz-cyr,uz-Cyrl,uz-Latn,vec,vi,vls,vo,wa,war,win,wiy,wo,wuu,xmf,yi,yo,yue,yue-Hant,yue-Latn,za,zgh,zh,zh_pinyin,zh_zhuyin,zh-Hans,zh-Hant,zh-Latn-pinyin,zu,zza";
let langs2 = "ypk,ood,apw,apm,apj,kee,kjq,zun,otw,hop,ik,esk,tew,cic,mus,akz,cku,cro,shh,tix,twf,tow,bla,uma,yak,pao,mnr,ute,aht,tfn,ing,hoi,koy,kuu,tcb,tau,gwi,haa,hup,see,kio,ale,sal,lut,str,cow,oka,fla,sac,kic,arp,tli,ess,pqm,nez,hid,kyh,one"
console.log(langs.split(',').concat(langs2.split(',')).sort(function (a, b) {
return a.toLowerCase().localeCompare(b.toLowerCase());
}).join(","));
Do you happen to know how much performance cost there is for supporting a low-coverage language?
If a tile has no features with the attribute values in a particular language, then it shouldn't get written to the tile at all, so it should be "free" to add as many languages as you want. As soon as one feature in a tile has data for one of these languages, then it gets a few bytes for the attribute name and a few bytes for the value.
It might be a good idea to double-check that the code used to generate the tiles is actually skipping blanks/nulls.
It might be a good idea to double-check that the code used to generate the tiles is actually skipping blanks/nulls.
Yes, Planetiler skips unset name fields; this is what makes OSM Americana’s bilingual labeling work. So it should have negligible impact on tile size realistically.
Interestingly, there doesn’t seem to be a very good correspondence between language speakers and number of tagged features for indigenous languages in general. I guess it would be easier to maintain a language list more automatically based on whatever is currently tagged in OSM. But at the same time, invalid language codes should be filtered out, like TEC
and left
, which are currently in the list somehow.
We include a lot of languages in the vector tiles, but many indigenous languages of the United States are missing:
https://github.com/osmus/tileservice/blob/edcdedfe078b5e25560e863b96687e3b72a51bd1/renderer/render_once.sh#L51
Here’s a list I cobbled together of extant indigenous languages and their dialects that are spoken in the U.S., that are spoken by more than 500 people as of 2008, and that have any coverage at all in OSM:
ypk
ood
apw
apm
apj
kee
kjq
zun
otw
hop
ik
esk
tew
cic
mus
akz
cku
cro
shh
tix
twf
tow
bla
uma
yak
pao
mnr
ute
aht
tfn
ing
hoi
koy
kuu
tcb
tau
gwi
haa
hup
see
kio
ale
sal
lut
str
cow
oka
fla
sac
kic
arp
tli
ess
pqm
nez
hid
kyh
one