Closed chrzyki closed 4 years ago
I don't fully understand. What kind of support is needed in pylexibank
? As far as I can tell, functions passed in to add_concepts
have access to all data in the concepticon concept list - so isn't this just a question of completing the data in Concepticon?
I agree - I don't think there is any particular need for code in here that handles this, but I was thinking that this might warrant a small discussion concerning 'best practices' for this purpose, i.e. "if need be, try calling the column xyz
". The issue might be a better fit for lexibank/lexibank.
Or even an issue for concepticon/concepticon-data? But I'm not really sure this kind of special case would require some standardized alternative_label
column. We are using all kinds of alternative labels already - numbers, local identifiers, etc.
So if anything, I'd say this is a documentation issue for the particular dataset. And dealing with it by putting a comment above the add_concepts
line in lexibank_*.py
is enough?
On second thought, maybe the documentation should go into NOTES.md
. Something along the lines of
... while the published concept list uses the labels such-and-such, the actual wordlists use slightly different labels ...
I don't think we have to go all the way and formalize this into a ConceptSpec
, which documents lookup in Concepticon.
Yes, that sounds good - thanks for the input. @LinguList do you agree with this? If so, we can close this and keep the issue for reference.
Yes, I fully agree. It is not a pylexibank issue, but a more generalized handling of the problem of receiving data in lexibank through a secondary source. Since this is happening in several cases, and since the relation concepticon -> source, concepticon -> digital source is stable, but we prefer to highlight the relation concepticon -> source, it is useful to document this as a best practice example, with the potentially comma-separated list of alias concepts in the ALIAS column and the intended target concept in GLOSS or ENGLISH or any other language.
Please see the discussion here:
https://github.com/lexibank/marrisonnaga/issues/21
Sometimes, there is a mismatch between digitised version of lists (e.g. as available on STEDT) and the original source material. Using the digitised version makes things easier for the Lexibank workflow, but may result in mismatches as outlined in the
marrisonnaga
issue. @LinguList's proposal for an ALIAS seems good to me.Do you have any preferences how in particular this should be handled? I'm aware that this is also related to how we handle concept lists in
concepticon-data
, but since this mainly concerns mappings of Lexibank datasets, I'm opening the issue here.