Open xrotwang opened 3 years ago
The raw data was not curated in edictor so far, but rather in the folder raw/LSI_text, which is not the best format for digitization, and it was difficult to parse in a first instance. But since this seems to be fine now, despite the problems with tones, I think this is a good example for the flexibility of CLDF(bench) right?
Ah, ok. So if forms for Lao are empty (see https://github.com/lexibank/lsi/issues/25#issuecomment-803533236), that's just how things are in the source data? If so, should we keep the language - and placeholder forms?
It points to an error in the digitization, I think. @PhyloStar should check for the original source sheet. So we best leave the issue open until that has been done.
Later, it may be useful to think of more checks for consistency. I think of checks for wordlength, or comparing with other datasets...
Yes, I think such checks would also be good content for a "Validation" section in a paper.
Exactly!
Lao is empty for all the instances in the original pdf. An empty form is depicted by "…".
Ok, but we'd want to load it into the web app nevertheless? I guess, yes, because the web app highlights that our digitization is true to the source, right?
Taraka Rama @.***> schrieb am So., 21. März 2021, 17:45:
Lao is empty for all the instances in the original pdf. An empty form is depicted by "…".
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803619645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKAM2SFRRXHDANDWQZLTEYPEHANCNFSM4ZRKOTZA .
What can we do here? When showing on the webapp, should "..." be depicted as empty?
Best, Taraka
On Sun, Mar 21, 2021 at 12:11 PM Robert Forkel @.***> wrote:
Ok, but we'd want to load it into the web app nevertheless? I guess, yes, because the web app highlights that our digitization is true to the source, right?
Taraka Rama @.***> schrieb am So., 21. März 2021, 17:45:
Lao is empty for all the instances in the original pdf. An empty form is depicted by "…".
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803619645, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAGUOKAM2SFRRXHDANDWQZLTEYPEHANCNFSM4ZRKOTZA
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803625084, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3IWJFJRKD2SVQNVMCZNDLTEYSFRANCNFSM4ZRKOTZA .
Hmm. I think the downloadable data can have "..." but the webapp won't load the placeholder forms. How does that sound?
If the PDF does not show nothing, I'd opt for not inlcuding the language and mentioning this in the paper.
Ok. let's do that.
On Sun, Mar 21, 2021, 3:36 PM Johann-Mattis List @.***> wrote:
If the PDF does not show nothing, I'd opt for not inlcuding the language and mentioning this in the paper.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803655676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3IWJD3JTO7E27ESTSN3MLTEZKDNANCNFSM4ZRKOTZA .
I don't know where the raw data for LSI is kept/curated. But if it is in EDICTOR, maybe we should implement a download command to sync it here?