lexibank / lexibank-analysed

Study on lexibank data (presenting the lexibank dataset).
Creative Commons Attribution 4.0 International
10 stars 3 forks source link

Open problems in some datasets for the release of the Lexibank application #39

Closed LinguList closed 1 year ago

LinguList commented 2 years ago
Dataset Note
idssegmented segmented version of IDS, needs to be released or fed into IDS
tuled We just take one release that works (need to check with concepts, etc.)
gravinachadic needs to be checked by one more person
vanuatuvoices needs to be released in a first version
sidwellvietic needs concept list
wheelerutoaztecan update dataset with new CLDF dataset ... title
chacolanguages concept list needs to be added to concepticon
oskolskayatungusic concept list can be taken from SawelyevTurkic
kochtukanoan concept list not in concepticon
lionnetyotonahua conceptlist not in concepticon
chaconnorthwestarawakan conceptlist not in concepticon
baf2 update with reference from 2022 and DOI of paper concepts not in concepticon
tolmiebritishcolumbia question to @chrzyki if we can or should add this
LinguList commented 2 years ago

These issues should be addressed, so we can finalize all datasets. We do not need to include any of these datasets, but some would be nice, since they can also help for CLICS4.

SimonGreenhill commented 2 years ago

I'll take:

LinguList commented 2 years ago

For oskolskaya, the conceptlist is just savekyev 2020 254 or similar.

LinguList commented 2 years ago

So this is even withiut 3.q releas of concepticon:)

SimonGreenhill commented 2 years ago

yes it's "leipzig-jakarta-jena" ...

SimonGreenhill commented 2 years ago

wheelerutoaztecan has been rebuilt: https://github.com/lexibank/wheelerutoaztecan/commit/a7e6f15c3b5c8259c97904bc68836550ccfc6435

SimonGreenhill commented 2 years ago

oskolskayatungusic is done.

LinguList commented 2 years ago

Nice, when did we add the list by Oskolskaya as such? This is what I like about concepticon recently: Lists get added without me noticing all details :) I left one minor issue there.

LinguList commented 2 years ago

@chrzyki, can you tell me on tolmiebritishcolumbia if you are fine to add it to Lexibank?

chrzyki commented 2 years ago

Sorry, missed the question in the table above! Sure - is there anything you'd like me to do/prepare for the data set?

LinguList commented 2 years ago

Is the concept list in concepticon? That would be a first question. And if you agree that we use it, as you have put lots of efforts into it and thought of an independent publication at some point.

chrzyki commented 2 years ago

Is the concept list in concepticon? That would be a first question. And if you agree that we use it, as you have put lots of efforts into it and thought of an independent publication at some point.

Thanks! Yes, the concept list is in Concepticon:

https://concepticon.clld.org/contributions/Tolmie-1884-211

And absolutely, please feel free to use this in any context you see fit! I'd be happy to see the data getting used somewhere. I'd also be happy to check the data set again for any errors/mistakes (the language mapping is rather tricky for a number of the languages) or extend it further if need be.

LinguList commented 2 years ago

So the only task would be that you double-check, and help to release via Zenodo later, so we can add the DOI and include it in lexibank 1.0 via the web app :)

LinguList commented 2 years ago

Ah, we don't have orthoprofiles. That would be needed of course. If we cannot get them now, I'd not inlcude the data for now, but maybe help adding them later next year.

chrzyki commented 2 years ago

So the only task would be that you double-check, and help to release via Zenodo later, so we can add the DOI and include it in lexibank 1.0 via the web app :)

I can help with that. :)

Orthoprofiles: Yes, no orthography profiles as of now, unfortunately. I can also help with that!

LinguList commented 2 years ago

Added an initial orthoprofile, @chrzyki, you can check and refine.

chrzyki commented 2 years ago

Thank you very much! I'll have a look and comment if necessary but for now this seems perfect.