lexibank / lsi

CLDF dataset derived from Grierson's "Linguistic Survey of India" from 1928
https://lsi.clld.org
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Download of raw data #27

Open xrotwang opened 3 years ago

xrotwang commented 3 years ago

I don't know where the raw data for LSI is kept/curated. But if it is in EDICTOR, maybe we should implement a download command to sync it here?

LinguList commented 3 years ago

The raw data was not curated in edictor so far, but rather in the folder raw/LSI_text, which is not the best format for digitization, and it was difficult to parse in a first instance. But since this seems to be fine now, despite the problems with tones, I think this is a good example for the flexibility of CLDF(bench) right?

xrotwang commented 3 years ago

Ah, ok. So if forms for Lao are empty (see https://github.com/lexibank/lsi/issues/25#issuecomment-803533236), that's just how things are in the source data? If so, should we keep the language - and placeholder forms?

LinguList commented 3 years ago

It points to an error in the digitization, I think. @PhyloStar should check for the original source sheet. So we best leave the issue open until that has been done.

Later, it may be useful to think of more checks for consistency. I think of checks for wordlength, or comparing with other datasets...

xrotwang commented 3 years ago

Yes, I think such checks would also be good content for a "Validation" section in a paper.

LinguList commented 3 years ago

Exactly!

PhyloStar commented 3 years ago

Lao is empty for all the instances in the original pdf. An empty form is depicted by "…".

xrotwang commented 3 years ago

Ok, but we'd want to load it into the web app nevertheless? I guess, yes, because the web app highlights that our digitization is true to the source, right?

Taraka Rama @.***> schrieb am So., 21. März 2021, 17:45:

Lao is empty for all the instances in the original pdf. An empty form is depicted by "…".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803619645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKAM2SFRRXHDANDWQZLTEYPEHANCNFSM4ZRKOTZA .

PhyloStar commented 3 years ago

What can we do here? When showing on the webapp, should "..." be depicted as empty?

Best, Taraka

On Sun, Mar 21, 2021 at 12:11 PM Robert Forkel @.***> wrote:

Ok, but we'd want to load it into the web app nevertheless? I guess, yes, because the web app highlights that our digitization is true to the source, right?

Taraka Rama @.***> schrieb am So., 21. März 2021, 17:45:

Lao is empty for all the instances in the original pdf. An empty form is depicted by "…".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803619645, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAGUOKAM2SFRRXHDANDWQZLTEYPEHANCNFSM4ZRKOTZA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803625084, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3IWJFJRKD2SVQNVMCZNDLTEYSFRANCNFSM4ZRKOTZA .

PhyloStar commented 3 years ago

Hmm. I think the downloadable data can have "..." but the webapp won't load the placeholder forms. How does that sound?

LinguList commented 3 years ago

If the PDF does not show nothing, I'd opt for not inlcuding the language and mentioning this in the paper.

PhyloStar commented 3 years ago

Ok. let's do that.

On Sun, Mar 21, 2021, 3:36 PM Johann-Mattis List @.***> wrote:

If the PDF does not show nothing, I'd opt for not inlcuding the language and mentioning this in the paper.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/lsi/issues/27#issuecomment-803655676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3IWJD3JTO7E27ESTSN3MLTEZKDNANCNFSM4ZRKOTZA .