lexibank / halenepal

CLDF dataset derived from Hale's "Wordlists in Selected Languages of Nepal" from 1973
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

non-matched srcids in stedt digitization #3

Closed LinguList closed 5 years ago

LinguList commented 5 years ago

The reason why we have only some 7000 instead of 10000 forms int eh data now is that there are sourcids that are in STEDT but not in hale:

1 , early , XIID
2 , bark , 01.027
3 , leaf (small) , 01.025
4 , can / be able to do something , XIIIA1
5 , eighty , 09.20
6 , old (of objects) , XIID
7 , we (pl) , 01.003
8 , up , XIIC
9 , nineteen (inanimate) , 09.10
10 , some (solids) , 09.54
11 , pair , 09.41
12 , thirty-nine , 09.15
13 , fly , 01.064
14 , bark (of tree) , 01.027
15 , frequently , XIID
16 , hundred (inanimate) , 09.24
17 , name , 01.100
19 , hot , 01.093
20 , round , 01.098
21 , bird , 01.020
22 , person , 01.018
23 , away from , XIIC
24 , all (for things) , 01.009;12d
25 , toward , XIIC
26 , know , 01.058
27 , until / as long as , XIID
28 , leaf (large) , 01.025
29 , twenty-two , 09.12
30 , know (fact / person) , XIIIA2
31 , you, honorific singular , 01.002
32 , man (masc) , 01.017
33 , during , XIID
34 , neither , 09.43
35 , round (sphere) , 01.098
36 , someone , 09.05
37 , long , 01.014
38 , hither and thither , XIIC
39 , last , 09.34
40 , hear , 01.058
41 , that , 01.005
42 , during / in the midle , XIID
43 , old , XIID
44 , frequent , XIID
45 , forty , 09.16
46 , every , 09.45
47 , never , XIID
48 , drink , 01.054
49 , cold , 01.094
50 , know (sthg) , XIIIA2
51 , this , 01.004
52 , not , 01.008
53 , dry , 01.099
54 , four , 09.02
55 , can / be able to do something or know something , XIIIA1
56 , nothing , 09.49
57 , more , XIID7
58 , we (excl) , 01.003
59 , eight , 09.06
60 , white , 01.090
61 , less , XIID
62 , big , 01.013
63 , above (directly) , XIIC
64 , armpit , 66
65 , where , XIIC
66 , more , XIID
67 , behind , XIIC
68 , through , XIIC
69 , all , XIID
70 , full , 01.095
71 , between , XIIC
72 , unmixed / without condiment , XIID
73 , late , XIID
74 , third, one- , 09.37
75 , red , 01.087
76 , all (for things) , 01.009,12d
77 , new , XIID
78 , five , 09.03
79 , stone , 01.077
80 , we (pl exclu) , 01.003
81 , place , 06a.0406a.
82 , moon , 01.073
83 , let (smn do sthg) , XIIIA4
84 , know , 01.059
85 , something , 09.48
86 , behind (not visible) , XIIC
87 , no one , 09.51
88 , earth , 01.079
89 , get up , 0 2b1.59
90 , cloud , 01.080
91 , daily , XIID
92 , fish , 01.019
93 , ninety-nine (inanimate) , 09.23
94 , eighth , 09.33
95 , where (w. to) , XIIC
96 , hundred, two , 09.28
97 , die , 01.061
98 , quarter, one- , 09.38
99 , we (incl) , 01.003
100 , false , XIID14
101 , one , 01.011
102 , after / at last , XIID
103 , unmixed / pure , XIID
104 , root , 01.026
105 , ninety , 09.21
106 , give , 01.070
107 , some (fluids) , 09.54
108 , eat , 01.055
109 , come , 01.066
110 , sand , 01.078
111 , seven , 09.05
112 , we (du) , 01.003
113 , good , 01.097
114 , man , 01.017
115 , hundred and ninety (inanimate) , 09.27
116 , both , 09.44
117 , seventy , 09.19
118 , jackal , Hale 73 CSD
119 , up (straight up) , XIIC
120 , together , 09.47
121 , total , 09.46
122 , beyond , XIIC
123 , mixed , XIID
124 , hundred , 09.24
125 , half way , 09.36
126 , say , 01.071
127 , where , XIIC64
128 , new , 01.096
129 , sun , 01.072
130 , quarters, three- , 09.40
131 , sixty , 09.18
132 , bite , 01.056
133 , another , 09.52
134 , gloss , srcid
135 , fifty , 09.17
136 , third , 09.32
137 , lie , 01.067
138 , around , XIIC
139 , I , 01.001
140 , real , XIID
141 , tree , 01.023
142 , whole , XIID
143 , burn , 01.084
144 , all , I.9
145 , both (inanimate) , 09.44
146 , hundred and two (inanimate) , 09.25
147 , see , 01.057
148 , under (below) , XIIC
149 , water , 01.075
150 , in front of , XIIC56
151 , none , 09.42
152 , ash , 01.083
153 , beneath , XIIC
154 , second , 09.31
155 , half , 09.35
156 , hand , 
157 , smoke , 01.081
158 , woman , 01.016
159 , first , 09.30
160 , yellow , 01.089
161 , under , XIIC
162 , three , 09.01
163 , we (pl incl) , 01.003
164 , seed , 01.024
165 , dog , 01.021
166 , twenty-nine , 09.13
167 , thou , 01.002
168 , hundred and thirty , 09.26
169 , beside , XIIC
170 , some (grain) , 09.54
171 , until , XIID
172 , ten , 09.08
173 , some , 09.54
174 , nineteen , 09.10
175 , stand , 01.069
176 , thirteen , 09.09
177 , all , 01.009,12.
178 , something (unknown thing) , 09.48
179 , both (animate) , 09.44
180 , none (inanimate) , 09.42
181 , louse (head) , 01.022
182 , where (w. at) , XIIC
183 , when , XIID
184 , all , 01.009,12d
185 , hundred and two , 09.25
186 , too much , 09.55
187 , sleep , 01.060
188 , over , XIIC
189 , root / tuber , 01.026
190 , who , 01.006
191 , bite (past) , 01.056
192 , frequently / sometimes , XIID
193 , after , XIID
194 , sit , 01.068
195 , out of , XIIC
196 , rain , 01.076
197 , what , 01.007
198 , green , 01.088
199 , wing , 93
200 , partial , XIID
201 , nine , 09.07
202 , hundred and ninety , 09.27
203 , many , 01.010
204 , arm , 66
205 , path , 01.085
206 , two , 01.012
207 , most , XIID
208 , tickle , 158
209 , mountain , 01.086
210 , palm of hand , 66
211 , thirds, two- , 09.39
212 , none (animate) , 09.42
213 , thirty-one , 09.14
214 , twenty , 09.11
215 , louse , 01.022
216 , hundred and thirty (inanimate) , 09.26
217 , in / inside of , XIIC
218 , all , 01.009
219 , someone , 09.50
220 , far , XIIC1
221 , forty (inanimate) , 09.16
222 , over (above) , XIIC
223 , small , 01.015
224 , we , 01.003
225 , we (du incl) , 01.003
226 , behind (visible) , XIIC
227 , across , XIIC
228 , swim , 01.063
229 , until , XIID31
230 , old (of clothing) , XIID
231 , this , 01.100
232 , seed (includes fruit) , 01.024
233 , walk , 01.065
234 , cause (smn to do sthg) , XIIIA5
235 , down , XIIC
236 , before , XIID
237 , ninety-eight (inanimate) , 09.22
238 , all (for people) , 01.009
239 , ninety-eight , 09.22
240 , up (up country) , XIIC
241 , ninety (inanimate) , 09.21
242 , four (inanimate) , 09.02
243 , least , XIID
244 , star , 01.074
245 , night , 01.092
246 , black , 01.091
247 , ninety-nine , 09.23
248 , thousand, one , 09.29
249 , six , 09.04
250 , under (beneath) , XIIC
251 , we (du excl) , 01.003
252 , unmixed , XIID
253 , new one; one which is new , XIID
254 , kill , 01.062
255 , leaf , 01.025
256 , fire , 01.082
257 , let (smn do sthg) / permit , XIIIA4
258 , cold (wet) , 01.094
259 , infrequent , XIID
260 , above , XIIC

If those are identified (and ideally corrected in some json or whatever), we should have the full account of the data.

LinguList commented 5 years ago

Given that there are 260 of those, it is definitely worth doing this.

LinguList commented 5 years ago

It is systematic, so I can fix many things without going manual, will send an update soon.

LinguList commented 5 years ago

So, here are the ones really missing, they should be added (should be quick, by comparing concepts):

number short ID (autoconvert) id-in-stedt concept-in-sttedt
1 13a.5 XIIIA5 cause (smn to do sthg)
2 12c. XIIC under (below)
3 12c. XIIC under
4 12d. XIID old (of clothing)
5 12c. XIIC where (w. to)
6 12c. XIIC behind (not visible)
8 12d. XIID never
9 I.9 I.9 all
10 Hale 73 c.Sd. Hale 73 CSD jackal
11 12d. XIID unmixed
12 12c. XIIC hither and thither
13 13a.1 XIIIA1 can / be able to do something or know something
14 12d. XIID unmixed / pure
15 12d. XIID least
16 12c. XIIC through
17 13a.4 XIIIA4 let (smn do sthg) / permit
18 12d. XIID real
19 12c. XIIC around
20 12d. XIID new one; one which is new
21 0 2b1.59 0 2b1.59 get up
22 12d. XIID new
23 12c. XIIC over (above)
24 12c. XIIC toward
25 66 66 armpit
26 01.009;12d 01.009;12d all (for things)
27 12c. XIIC in / inside of
28 12d. XIID mixed
29 12c. XIIC behind (visible)
30 12d. XIID before
31 12c. XIIC away from
32 12c. XIIC under (beneath)
33 93 93 wing
34 12c. XIIC out of
35 srcid srcid gloss
36 12d. XIID early
37 12c. XIIC above (directly)
38 12d. XIID partial
39 12c. XIIC between
40 12c. XIIC down
41 12d. XIID frequently
42 12c. XIIC beneath
43 01.009,12d 01.009,12d all
44 12d. XIID until / as long as
45 12c. XIIC across
46 12d. XIID less
47 12d. XIID during / in the midle
48 12d. XIID old (of objects)
49 12d. XIID all
50 12c. XIIC beyond
51 12c. XIIC where (w. at)
52 12c. XIIC beside
53 12d. XIID during
54 12d. XIID old
55 hand
56 13a.4 XIIIA4 let (smn do sthg)
57 01.009,12d 01.009,12d all (for things)
58 12d.7 XIID7 more
59 12d. XIID most
60 12c. XIIC up (up country)
61 12d. XIID more
62 12d. XIID frequently / sometimes
63 13a.2 XIIIA2 know (fact / person)
64 12c. XIIC where
65 12d. XIID when
66 12c. XIIC behind
67 66 66 palm of hand
68 12d. XIID after
69 12c. XIIC up
70 12d. XIID unmixed / without condiment
71 12d. XIID infrequent
72 12d. XIID frequent
73 06a.0406a. 06a.0406a. place
74 158 158 tickle
75 12c.1 XIIC1 far
76 12d. XIID until
77 12d. XIID late
78 12d. XIID daily
79 12c. XIIC up (straight up)
80 13a.1 XIIIA1 can / be able to do something
81 66 66 arm
82 13a.2 XIIIA2 know (sthg)
83 12c. XIIC above
84 01.009,12. 01.009,12. all
85 12d. XIID whole
86 12d. XIID after / at last
87 12c. XIIC over
LinguList commented 5 years ago

Could you look into this, @chrzyki ?

I'll soon push the orthography profile (yes, this is actually working!)

chrzyki commented 5 years ago

Cool that the orthography profile is working! Will have a look at the SRCIDS.

chrzyki commented 5 years ago

Thanks @natalia-morozova & see here @LinguList:


LinguList commented 5 years ago

Four are still hleft, if you re-run the code, otherwise it's fine.