Open perklet opened 6 years ago
多扫了一些结果,发现几百万的记录中,有f的一共就360+条,而f和s不等的只有20+。另外发现了b和z,在全部记录中也只有大约10条(有b的也有z,有z的也有b)。所以这些f/b/z可能是早期用过的复数/比较级/最高级吧。
sqlite> select id, word,exchange from stardict where exchange like "%f:%" limit 1000;
390693|behoove|f:behooves
519724|brown-nose|f:brown-noses
535544|bunco|f:buncos/s:buncoes
654777|cha-cha|f:cha-chas/s:chas-chas
700802|Christmas|f:Christmases/s:christmass
895392|court-martial|s:court-martials/f:courts-martial
903955|crawfish|f:crawfishes
1118930|DJ|s:djs/f:DJs
1513202|French|f:Frenches/s:frenches
1515606|fresco|s:frescoes/f:frescos
1660077|green|s:greens/b:greener/z:greenest/f:greens
2191094|lighter|s:lighters/f:lighters
2250169|low|s:lows/b:lower/f:lows/z:lowest
2418547|microcopy|f:microcopies
3121976|purple|s:purples/b:purpler/f:purples/z:purplest
3223854|reecho|f:reechoes
3257245|replete|s:repletes/b:more replete/z:most replete/f:repletes
3347847|rumpus|f:rumpuses/s:rumpuss
3561695|sled|s:sleds/f:sleds
3723990|strip mine|f:strip mines/s:strip ours
3761122|Sunday|s:sundays/f:Sundays
4077529|underdress|f:underdresses
4249331|whiz|f:whizzes/s:whizs
4301327|yen|s:yens/f:yen
sqlite>
sqlite> select id, word,exchange from stardict where exchange like "%b:%" limit 1000;
337385|bald|b:balder/z:baldest/
510810|brisk|b:brisker/z:briskest/s:brisks/
987166|deaf|b:deafer/z:deafest/s:deafs/
1531169|full|b:fuller/z:fullest/s:fulls/
1660077|green|s:greens/b:greener/z:greenest/f:greens/
2250169|low|s:lows/b:lower/f:lows/z:lowest/
2492816|motley|b:motlier/z:motliest/f:motleys/s:motleys
3048602|prim|b:primmer/z:primmest/
3121976|purple|s:purples/b:purpler/f:purples/z:purplest
3257245|replete|s:repletes/b:more replete/z:most replete/f:repletes
3771014|supple|b:suppler/z:supplest/
4228031|wee|s:wees/b:weer/z:weest
sqlite> select id, word,exchange from stardict where exchange like "%z:%" limit 1000;
337385|bald|b:balder/z:baldest/
510810|brisk|b:brisker/z:briskest/s:brisks/
987166|deaf|b:deafer/z:deafest/s:deafs/
1531169|full|b:fuller/z:fullest/s:fulls/
1660077|green|s:greens/b:greener/z:greenest/f:greens/
2250169|low|s:lows/b:lower/f:lows/z:lowest/
2492816|motley|b:motlier/z:motliest/f:motleys/s:motleys
3048602|prim|b:primmer/z:primmest/
3121976|purple|s:purples/b:purpler/f:purples/z:purplest
3257245|replete|s:repletes/b:more replete/z:most replete/f:repletes
3771014|supple|b:suppler/z:supplest/
4228031|wee|s:wees/b:weer/z:weest
好像是吧。
要提同样的问题,发现系统提示了这个类似问题,调研了一下,猜测如下:
找出前100个有f的exhcnage字段,发现数量不多,77w+的词条中才有100个,就这100个词条仅比对s和f,发现f和s多数一致,不一致的5个中,有2个只有f没有s,另外3个的f比s准确。所以猜测f是对一些名词复数形式的fix,如果发现了f,可以采用f而弃用s。
待向 @skywind3000 求证。
s和f不一致的几个: