enhance add_concepts to collect extra fields?

SimonGreenhill commented 4 years ago

Currently add_concepts cuts down a lot of boilerplate but it can't handle cases like this:

https://github.com/lexibank/beidasinitic/blob/550910a83c185b12da06a1d1faa03bd0683ab2c0/lexibank_beidasinitic.py#L62-L74

.. i.e. as the field 'chinese' cannot be added as it requires access to concepts.attributes['chinese']. This is not the only dataset that does this e.g. castrosui.

This might be a case where dropping the convenience function of add_concepts makes sense, but it would be nice to lose ~10 lines of code if there's a nice way to handle this

SimonGreenhill commented 4 years ago

e.g.


        concepts = args.writer.add_concepts(
            id_factory=lambda c: c.id.split('-')[-1]+ '_' + slug(c.english),
            lookup_factory="Name",
            extra = [
                 'Chinese_Gloss': lambda c: c.attributes["chinese"],
            ]
        )

LinguList commented 4 years ago

We discussed this at some length a couple of months ago and decided to not address this problem now, also because individual concept lists are too different. Let me look for the issue in pylexibank.

LinguList commented 4 years ago

And you can see that I keep a mechanical style in these datasets to show the theoretical modularity of the code.

LinguList commented 4 years ago

https://github.com/lexibank/pylexibank/issues/136

LinguList commented 4 years ago

Just added the issue #136, where we discussed this.

SimonGreenhill commented 4 years ago

Ahh, looks like I missed that whole issue 🤷‍♂

LinguList commented 4 years ago

Like I never knew we had a pytest-cldf...

Well while we discussed at some length the modularity of this adding of code, we decided back then that it might require a lot of modification in the code to modularize this, and that one would generally encounter the problem again at some point. So I tried to keep all the pieces where I re-use the same line of codes as clean as possible to make it possible to later check again, in case we would decide to address the issue.

SimonGreenhill commented 4 years ago

Another alternative -- if possible? would be to add a handful of cleverer table field types? e.g. a LanguageGloss type that adds and sets the required field:

class CustomConcept(Concept):
    Chinese_Gloss = LanguageGloss(default=None, language="zh")
    Spanish_Gloss = LanguageGloss(default=None, language="sp")

xrotwang commented 4 years ago

I think it would be most transparent - and also highlight the importance/usefulness of concept lists in Concepticon - if we had a Concept class that derives its attributes by inspecting the concept list. I think that's doable, see http://www.attrs.org/en/stable/examples.html#other-goodies

xrotwang commented 4 years ago

So I'd propose

a concept_class_from_conceptlist flag on Dataset
if True, the concept class will be assembled programmatically (when makecldf is run, because access to Concepticon is required) as described above, populated with all attributes found in the concept list.

LinguList commented 4 years ago

The problem is only the name-spacing, right? We have ENGLISH instead of Name, etc. So do we hard-code a number of attributes, or what do we do here?

LinguList commented 4 years ago

I mean, this namespacing problem was the original reason why I could not use "add_concepts" but had to use "add_concept" in each case.

chrzyki commented 4 years ago

The problem is only the name-spacing, right? We have ENGLISH instead of Name, etc. So do we hard-code a number of attributes, or what do we do here?

@LinguList the 'mapping' is happening here if that is what you're looking for: https://github.com/lexibank/pylexibank/blob/50af1e6ad45c48cdfed33cb318431e2d7eb54823/src/pylexibank/util.py#L161-L191

LinguList commented 4 years ago

@chrzykie, Yes, I am aware of that, but the coding cannot be done in a flip from add_concepts, due to the current code, so I had to loop over self.conceptlist[0] etc. and apply add_concept. And it is not trivial to adjust add_concepts at once now, as far as I remember when I studied and discussed the code with @xrotwang.

chrzyki commented 4 years ago

Ah, gotcha, thanks for explaining.

SimonGreenhill commented 4 years ago

I like this -- but (some bikeshedding, sorry) can we make it use_concepticon=True?

I think it'll be good to include the concepticon metadata, and it makes it clear that the concept list is offloaded to concepticon and not lurking in etc/concepts.tsv

xrotwang commented 4 years ago

Maybe we can have a special constant, e.g.

concept_class = CONCEPTICON

lexibank / pylexibank

enhance add_concepts to collect extra fields? #200