Closed HedvigS closed 6 months ago
Thinking about this, it seems error prone to have to binarise both the ValuesTable and the ParameterTable, can we consolidate this into one function that works directly on the CLDF object?
e.g. newcldf <- binarise(cldf)
Then the make_theo_score function can take a cldfobject, check that it's binarised or not (and run binarise if needed). Should be much cleaner and easier to use.
I would like to create as few new object types as possible. In my experience, new object classes often make functions less accessible to new users.
I prefer the current set-up where there is one function for binarising the ValueTable and the ParameterTable.
If you'd like, I can adapt make_theo_score() to check if it's binarised and if not throw an error or binarise it
I've adjusted the function, it now takes also unbinarised tables and binarises them.
@SimonGreenhill is this change acceptable?
1) in the grambank-analysed script workflow, we worked a lot with the binarised version of the ParameterTable, which contains crucial data for the word order theo score. When breaking it down into singular functions for the package, I missed this detail and have now remedied it
2) I've implemented an argument to make_theo_scores that defines a cut-off for data coverage that is defined for each set of features for the theo scores, instead of overall for the entire dataset. i.e. if for specifically the features that have to do with fusion there is 60% coverage and the cut-off is 75% that language is dropped.