lexibank / bowernpny

CLDF dataset derived from Bowern and Atkinson's "Internal Structure of Pama-Nyungan" from 2012
Creative Commons Attribution 4.0 International
0 stars 1 forks source link

Updated to new version of pylexibank. #4

Closed tresoldi closed 5 years ago

tresoldi commented 5 years ago

The code in master is currently failing for database loading with the most recent version of pylexibank. This is due to the new organization of pylexibank.dataset, in particular its Language class -- in short, given that we have a Glottolog_Name in etc/languages.csv, we end up with duplicated column names (makecldf is not failing because, unlike SQL, it uses case-sensitive dictionary keys -- maybe we should always lower them, catching potential db problems in advance?).

This commit removes the Glottolog_Name from etc/languages.csv (which is superfluous and in at least one case outdated, as there have been changes in Glottolog) and removes the custom Language class (which was only used to carry over that name).

xrotwang commented 5 years ago

@LinguList @tresoldi @SimonGreenhill we should decide whether we make it a policy only, ever to use released versions of clld/glottolog, or whether it is ok to have work-in-progress commits like this one, which are based on non-released Glottolog commits.

tresoldi commented 5 years ago

@LinguList @tresoldi @SimonGreenhill we should decide whether we make it a policy only, ever to use released versions of clld/glottolog, or whether it is ok to have work-in-progress commits like this one, which are based on non-released Glottolog commits.

I made a similar comment on the PR for lexibank/ids, you might have missed it... I am in favor of only using released version, but please note that some of the datasets are already using work-in-progress commits of glottolog and concepticon (it was probably necessary at the time, as at least in the case of those I was developing the lexibank generation was done in tandem with the concepticon mapping, sometimes also with PRs to glottolog).

I did not submit other PRs (like for allenbai, which is failing cltf generation due to changes in Concepticon) precisely for such reason, but the case of bowernpny is more "urgent", as it is currently impossible to load it into a database (even with the CLDF output pre-generated in the repository).

xrotwang commented 5 years ago

@tresoldi yes, I'd say we should aim for using only released versions, too. But we don't have to be too strict. After all, we are in control of the relevant repos, and can even cut releases from older commits. But since that would be a bit arbitrary, I'd propose:

tresoldi commented 5 years ago

Ok, I'll first organize locally to always use the latest released versions of both Concepticon and Glottolog, reverting to some previous release if necessary. I'll close the PRs for the time being.

xrotwang commented 5 years ago

I added notes regarding this policy here https://github.com/lexibank/lexibank/blob/master/README.md#dataset-curation-versioning-and-releases

SimonGreenhill commented 5 years ago

Yes, agree -- only use non-released versions unless otherwise really necessary.