Closed SimonGreenhill closed 4 years ago
Hm, in the example you point it was not exactly the case that "all cognates are singletons". One raw value seems to have been split into two forms, ending up in the same cognate set. So the check you propose would not have caught the error. We could still add some sort of test, if there'd be a reasonable threshold for a "weird" ratio of singleton cognate sets - but I'm not sure this would be super useful.
In fact: cognate distributions can be very different, etc., and one can even think of a deliberately coded cognate set having only singletons (the most unrelated languages of the world).
The "cognate diversity" score we introduce is probably already something good to check. And I think we always compute it, right? Additional stats, like cognate diversity per concept, or similar, could also be thought of.
Checking is then the obligation of the scolar working on a dataset.
ok
@LinguList what is this "cognate diversity score" and where is it computed?
see this.