lexibank / pylexibank

The python curation library for lexibank
Apache License 2.0
18 stars 7 forks source link

Add check that all cognates are not singletons #216

Closed SimonGreenhill closed 4 years ago

SimonGreenhill commented 4 years ago

see this.

xrotwang commented 4 years ago

Hm, in the example you point it was not exactly the case that "all cognates are singletons". One raw value seems to have been split into two forms, ending up in the same cognate set. So the check you propose would not have caught the error. We could still add some sort of test, if there'd be a reasonable threshold for a "weird" ratio of singleton cognate sets - but I'm not sure this would be super useful.

LinguList commented 4 years ago

In fact: cognate distributions can be very different, etc., and one can even think of a deliberately coded cognate set having only singletons (the most unrelated languages of the world).

The "cognate diversity" score we introduce is probably already something good to check. And I think we always compute it, right? Additional stats, like cognate diversity per concept, or similar, could also be thought of.

Checking is then the obligation of the scolar working on a dataset.

SimonGreenhill commented 4 years ago

ok

xrotwang commented 4 years ago

@LinguList what is this "cognate diversity score" and where is it computed?