Open IvanUkhov opened 8 years ago
What do you mean by "the superscript in the second alternative is not properly formatted."
@hashier, there are two possible pronunciations, but accent 2 (grave) is denoted correctly only in the first one. So, it should be like this (just as on Folkets lexikon):
[2sj'o:r_ta el. 2sj'or_t:a]
Note the second “2”. While we’re on it, why have you decided to denote stress by capitalizing letters instead of using the traditional notation? Thanks!
@hashier, sorry for picking on details. I just think that pronunciation is the most important part of the language, and it’s also the one that is the most difficult to master. It’s of great help to be able to clearly see how to pronounce words. I wish Dictionary had sound.
Ah, I see what you mean with the 2.
I didn't pick anything, I used use what was in dataset that I got from folkets lexikon. Since I always find it hard to read anyway I never realised that it is completely wrong (:
I checked what the "data" is for the pronunciation of skjorta
<word class="nn" lang="sv" value="skjorta">
<translation value="shirt" />
<phonetic soundFile="skjorta.swf" value="²$O:r+ta el. 2$Or+t:a" />
[...]
Seems like it's already broken in that file so I guess there is nothing we can do to fix it :
Hmm, the interesting thing is that their web interface pulls data from the same database, and this “broken” representation is exactly what it gets to work with. For instance, here is the server’s response for “skjorta”:
//OK[6,0,0,1,5,4,2,3,0,0,1,2,2,0,0,1,["se.algoritmica.folkets.client.LookUpResult/1089098233","[I/2970817851","[Ljava.lang.String;/2600011424","<word class=\"nn\" date=\"2011-03-03\" id=\"158400\" lang=\"sv\" lexinid=\"15841\" origin=\"lexin\" value=\"skjorta\"><translation date=\"2011-03-03\" id=\"15559\" value=\"shirt\"></translation><phonetic date=\"2011-03-03\" soundFile=\"skjorta.swf\" value=\"²$O:r+ta el. 2$Or+t:a\"></phonetic><paradigm date=\"2011-03-03\" id=\"13806\" origin=\"lexin\"><inflection value=\"skjortan\"></inflection><inflection value=\"skjortor\"></inflection></paradigm><see date=\"2011-03-03\" origin=\"saldo\" type=\"saldo\" value=\"skjorta||skjorta..1||skjorta..nn.1\"></see><compound date=\"2011-03-03\" id=\"5537\" value=\"bomullsskjorta\"><translation value=\"cotton shirt\"></translation></compound><compound date=\"2011-03-03\" id=\"5538\" inflection=\"skjort|kragen\" value=\"skjort|krage\"><translation value=\"shirt collar\"></translation></compound><idiom date=\"2011-03-03\" id=\"1358\" value=\"kosta skjortan (&quot;kosta väldigt mycket&quot;)\"><translation value=\"cost a packet (&quot;cost very much&quot;)\"></translation></idiom><definition date=\"2011-03-03\" id=\"15341\" value=\"ett tunnare klädesplagg med krage, ärmar och knäppning fram\"></definition><url date=\"2011-03-03\" origin=\"lexin\" type=\"any\" value=\"8/herr.swf\"></url></word>","<word value=\"shirt\" lang=\"en\" class=\"nn\" id=\"379721\" origin=\"lexin\" date=\"2009-02-24\"><translation id=\"379721-1\" value=\"skjorta\" origin=\"lexin\" date=\"2009-02-24\"></translation><example id=\"379721-2\" value=\"Tom put on a clean white shirt and a tie.\" origin=\"lexin\" date=\"2009-02-24\"><translation value=\"Tom satte på sig en ren vit skjorta och en slips.\" origin=\"lexin\" date=\"2009-02-24\"></translation></example><explanation value=\"A piece of clothing with collar, sleeves and buttons down the front.\" origin=\"lexin\" date=\"2009-02-24\"></explanation></word>","skjorta"],0,7]
If you scroll to the right, you’ll see exactly what you wrote above. So, I guess, there’s some post-processing on the client side that makes it look pretty.
How did you make the request?
I don't think they do post processing, I assume their DB -> XML is "broken" and their homepage is not using DB -> XML but something else. If they of course use the same interface that you used then I have no idea how they fix it.
In you have Chrome,
I’m not claiming that that’s how they do it; I really have no idea. Maybe there’s some other mechanism, which is intentionally hidden.
nah, that's just us talking to their web server, that's not how they talk internally to their DB.
But interesting that it shows the correct stuff on the homepage in the end... maybe reading their javascript of handling the response might solve this problem but who's got time to do that (:
I’ve found the code that seems to be doing the translation. Unfortunately, it’s heavily obfuscated and pretty much useless:
function Rbb(b) {
var i, j;
Arb[i = ++Brb] = Rbb;
Crb[i] = KXb + sMb, Fbb();
var c, d, e, f, g;
f = new(Crb[i] = KXb + ZRb, pZ)(utb);
g = OY((Crb[i] = KXb + '547', b));
for (Crb[i] = KXb + SEb, d = 0, e = g.length;
(Crb[i] = KXb + SEb, d) < e; Crb[i] = KXb + SEb, ++d) {
c = g[d];
switch (Crb[i] = KXb + TEb, c) {
case 50:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + tCb, f).a).a += '\xB2';
break;
case 43:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + WEb, f).a).a += '_';
break;
case 64:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + wCb, f).a).a += 'ng';
break;
case 99:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + yCb, f).a).a += 'tj';
break;
case 36:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + XEb, f).a).a += 'sj';
break;
case 65:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + jZb, f).a).a += "'a";
break;
case 69:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + zCb, f).a).a += "'e";
break;
case 73:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + aFb, f).a).a += "'i";
break;
case 79:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + aGb, f).a).a += "'o";
break;
case 85:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + xOb, f).a).a += "'u";
break;
case 89:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + NEb, f).a).a += "'y";
break;
case 197:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + _Rb, f).a).a += "'\xE5";
break;
case 196:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + dMb, f).a).a += "'\xE4";
break;
case 214:
Crb[i] = dvb + ttb, (Crb[i] = tTb + Vwb, (Crb[i] = KXb + PRb, f).a).a += "'\xF6";
break;
default:
Crb[i] = dvb + ptb, (Crb[i] = tTb + Lwb, (Crb[i] = KXb + fGb, f).a).a += (Crb[i] = xub + kBb, (Crb[i] = xub + kBb, String).fromCharCode((Crb[i] = KXb + fGb, c)));
}
}
j = (Crb[i] = dvb + Gtb, (Crb[i] = tTb + _vb, (Crb[i] = KXb + ePb, f).a).a);
Brb = i - 1;
return j
}
That code indeed works. Here is a more human-friendly version:
var mapping = {
50: '\xB2',
43: '_',
64: 'ng',
99: 'tj',
36: 'sj',
65: "'a",
69: "'e",
73: "'i",
79: "'o",
85: "'u",
89: "'y",
197: "'\xE5",
196: "'\xE4",
214: "'\xF6",
};
function translate(text) {
var buffer = "";
for (var i = 0, length = text.length, next; i < length; i++) {
next = mapping[text[i].charCodeAt(0)];
if (next == undefined) {
next = text[i];
}
buffer += next;
}
return buffer;
}
wow! Batshit crazy! This was really something to get the obfuscated code to something like this simple! <3 love it
Hello,
The following screenshot demonstrates a number of issues with pronunciation:
Regarding the first problem, try to look up words with the tj sound like tjugo; the sound will be erroneously represented by the letter c.
Regards, Ivan