grambank / rgrambank

R package to access and analyse Grambank's CLDF data
Apache License 2.0
4 stars 1 forks source link

Update make theo for missing cropping #41

Closed HedvigS closed 6 months ago

HedvigS commented 11 months ago

1) in the grambank-analysed script workflow, we worked a lot with the binarised version of the ParameterTable, which contains crucial data for the word order theo score. When breaking it down into singular functions for the package, I missed this detail and have now remedied it

2) I've implemented an argument to make_theo_scores that defines a cut-off for data coverage that is defined for each set of features for the theo scores, instead of overall for the entire dataset. i.e. if for specifically the features that have to do with fusion there is 60% coverage and the cut-off is 75% that language is dropped.

HedvigS commented 8 months ago

Thinking about this, it seems error prone to have to binarise both the ValuesTable and the ParameterTable, can we consolidate this into one function that works directly on the CLDF object?

e.g. newcldf <- binarise(cldf)

Then the make_theo_score function can take a cldfobject, check that it's binarised or not (and run binarise if needed). Should be much cleaner and easier to use.

I would like to create as few new object types as possible. In my experience, new object classes often make functions less accessible to new users.

I prefer the current set-up where there is one function for binarising the ValueTable and the ParameterTable.

HedvigS commented 8 months ago

If you'd like, I can adapt make_theo_score() to check if it's binarised and if not throw an error or binarise it

HedvigS commented 8 months ago

I've adjusted the function, it now takes also unbinarised tables and binarises them.

HedvigS commented 8 months ago

@SimonGreenhill is this change acceptable?