Closed SimonGreenhill closed 4 years ago
e.g.
concepts = args.writer.add_concepts(
id_factory=lambda c: c.id.split('-')[-1]+ '_' + slug(c.english),
lookup_factory="Name",
extra = [
'Chinese_Gloss': lambda c: c.attributes["chinese"],
]
)
We discussed this at some length a couple of months ago and decided to not address this problem now, also because individual concept lists are too different. Let me look for the issue in pylexibank.
And you can see that I keep a mechanical style in these datasets to show the theoretical modularity of the code.
Just added the issue #136, where we discussed this.
Ahh, looks like I missed that whole issue 🤷♂
Like I never knew we had a pytest-cldf...
Well while we discussed at some length the modularity of this adding of code, we decided back then that it might require a lot of modification in the code to modularize this, and that one would generally encounter the problem again at some point. So I tried to keep all the pieces where I re-use the same line of codes as clean as possible to make it possible to later check again, in case we would decide to address the issue.
Another alternative -- if possible? would be to add a handful of cleverer table field types? e.g. a LanguageGloss
type that adds and sets the required field:
class CustomConcept(Concept):
Chinese_Gloss = LanguageGloss(default=None, language="zh")
Spanish_Gloss = LanguageGloss(default=None, language="sp")
I think it would be most transparent - and also highlight the importance/usefulness of concept lists in Concepticon - if we had a Concept
class that derives its attributes by inspecting the concept list. I think that's doable, see http://www.attrs.org/en/stable/examples.html#other-goodies
So I'd propose
concept_class_from_conceptlist
flag on Dataset
True
, the concept class will be assembled programmatically (when makecldf
is run, because access to Concepticon is required) as described above, populated with all attributes found in the concept list.The problem is only the name-spacing, right? We have ENGLISH instead of Name, etc. So do we hard-code a number of attributes, or what do we do here?
I mean, this namespacing problem was the original reason why I could not use "add_concepts" but had to use "add_concept" in each case.
The problem is only the name-spacing, right? We have ENGLISH instead of Name, etc. So do we hard-code a number of attributes, or what do we do here?
@LinguList the 'mapping' is happening here if that is what you're looking for: https://github.com/lexibank/pylexibank/blob/50af1e6ad45c48cdfed33cb318431e2d7eb54823/src/pylexibank/util.py#L161-L191
@chrzykie, Yes, I am aware of that, but the coding cannot be done in a
flip from add_concepts
, due to the current code, so I had to loop over
self.conceptlist[0] etc. and apply add_concept
. And it is not trivial
to adjust add_concepts
at once now, as far as I remember when I
studied and discussed the code with @xrotwang.
Ah, gotcha, thanks for explaining.
I like this -- but (some bikeshedding, sorry) can we make it use_concepticon=True
?
I think it'll be good to include the concepticon metadata, and it makes it clear that the concept list is offloaded to concepticon and not lurking in etc/concepts.tsv
Maybe we can have a special constant, e.g.
concept_class = CONCEPTICON
Currently
add_concepts
cuts down a lot of boilerplate but it can't handle cases like this:https://github.com/lexibank/beidasinitic/blob/550910a83c185b12da06a1d1faa03bd0683ab2c0/lexibank_beidasinitic.py#L62-L74
.. i.e. as the field 'chinese' cannot be added as it requires access to concepts.attributes['chinese']. This is not the only dataset that does this e.g. castrosui.
This might be a case where dropping the convenience function of
add_concepts
makes sense, but it would be nice to lose ~10 lines of code if there's a nice way to handle this