Closed LinguList closed 4 years ago
Ah, I see, some spaces missing. +s, +ʔ, e+, u+ Identifiers? Their ID? emae1030-1086-1, mangareva239-1432-1, rarotongan58-1243-2, tuvalu753-1424-2
Their ID in the beginning of the text.
1510, 2010, 1224, 7247 would be the IDs in the original file.
Easiest way to fix this is to make a python dictionary with the ID and a recommended better form. Then you would use the get() method to insert the segments:
Segments={1510: "b l a + b l u".split()}.get(idx) or wl[idx, 'segments'],
This is not nice, in addition, you could make a CORRECTED version of the data, and place it in the repository as a copy of the original file, and we ask Mary to correct it in a new version (which is trivial). But please try it once, so you learn a bit what happens with the cldf creation procedure.
these are fixed in the corrected version of the data
If you check TRANSCRIPTION.md you find there are a couple of problems due to manual work, which can be easily refined. Just place a list called 'lexemes.tsv' into `etc/' and show how the value should be modified. Then you can use that to refine the segmentation. But as a first task, identify only the four outliers, i.e., their identifiers, so we can list them here.