lexibank / pylexibank

The python curation library for lexibank
Apache License 2.0
18 stars 7 forks source link

enhance add_concepts to collect extra fields? #200

Closed SimonGreenhill closed 4 years ago

SimonGreenhill commented 4 years ago

Currently add_concepts cuts down a lot of boilerplate but it can't handle cases like this:

https://github.com/lexibank/beidasinitic/blob/550910a83c185b12da06a1d1faa03bd0683ab2c0/lexibank_beidasinitic.py#L62-L74

.. i.e. as the field 'chinese' cannot be added as it requires access to concepts.attributes['chinese']. This is not the only dataset that does this e.g. castrosui.

This might be a case where dropping the convenience function of add_concepts makes sense, but it would be nice to lose ~10 lines of code if there's a nice way to handle this

SimonGreenhill commented 4 years ago

e.g.


        concepts = args.writer.add_concepts(
            id_factory=lambda c: c.id.split('-')[-1]+ '_' + slug(c.english),
            lookup_factory="Name",
            extra = [
                 'Chinese_Gloss': lambda c: c.attributes["chinese"],
            ]
        )
LinguList commented 4 years ago

We discussed this at some length a couple of months ago and decided to not address this problem now, also because individual concept lists are too different. Let me look for the issue in pylexibank.

LinguList commented 4 years ago

And you can see that I keep a mechanical style in these datasets to show the theoretical modularity of the code.

LinguList commented 4 years ago

https://github.com/lexibank/pylexibank/issues/136

LinguList commented 4 years ago

Just added the issue #136, where we discussed this.

SimonGreenhill commented 4 years ago

Ahh, looks like I missed that whole issue 🤷‍♂

LinguList commented 4 years ago

Like I never knew we had a pytest-cldf...

Well while we discussed at some length the modularity of this adding of code, we decided back then that it might require a lot of modification in the code to modularize this, and that one would generally encounter the problem again at some point. So I tried to keep all the pieces where I re-use the same line of codes as clean as possible to make it possible to later check again, in case we would decide to address the issue.

SimonGreenhill commented 4 years ago

Another alternative -- if possible? would be to add a handful of cleverer table field types? e.g. a LanguageGloss type that adds and sets the required field:

class CustomConcept(Concept):
    Chinese_Gloss = LanguageGloss(default=None, language="zh")
    Spanish_Gloss = LanguageGloss(default=None, language="sp")
xrotwang commented 4 years ago

I think it would be most transparent - and also highlight the importance/usefulness of concept lists in Concepticon - if we had a Concept class that derives its attributes by inspecting the concept list. I think that's doable, see http://www.attrs.org/en/stable/examples.html#other-goodies

xrotwang commented 4 years ago

So I'd propose

LinguList commented 4 years ago

The problem is only the name-spacing, right? We have ENGLISH instead of Name, etc. So do we hard-code a number of attributes, or what do we do here?

LinguList commented 4 years ago

I mean, this namespacing problem was the original reason why I could not use "add_concepts" but had to use "add_concept" in each case.

chrzyki commented 4 years ago

The problem is only the name-spacing, right? We have ENGLISH instead of Name, etc. So do we hard-code a number of attributes, or what do we do here?

@LinguList the 'mapping' is happening here if that is what you're looking for: https://github.com/lexibank/pylexibank/blob/50af1e6ad45c48cdfed33cb318431e2d7eb54823/src/pylexibank/util.py#L161-L191

LinguList commented 4 years ago

@chrzykie, Yes, I am aware of that, but the coding cannot be done in a flip from add_concepts, due to the current code, so I had to loop over self.conceptlist[0] etc. and apply add_concept. And it is not trivial to adjust add_concepts at once now, as far as I remember when I studied and discussed the code with @xrotwang.

chrzyki commented 4 years ago

Ah, gotcha, thanks for explaining.

SimonGreenhill commented 4 years ago

I like this -- but (some bikeshedding, sorry) can we make it use_concepticon=True?

I think it'll be good to include the concepticon metadata, and it makes it clear that the concept list is offloaded to concepticon and not lurking in etc/concepts.tsv

xrotwang commented 4 years ago

Maybe we can have a special constant, e.g.

concept_class = CONCEPTICON