grambank / rgrambank

R package to access and analyse Grambank's CLDF data
Apache License 2.0
4 stars 1 forks source link

Update language level df #25

Closed HedvigS closed 6 months ago

HedvigS commented 1 year ago

closes #19 and #23

HedvigS commented 1 year ago

In addition, changed name from language_level_df to reduce_ValueTable_to_unique_glottocodes to make it more transperent.

HedvigS commented 1 year ago

@xrotwang just reminded me of datasets like APiCs where it's possible to have more than one Value per Language_ID and Parameter_ID in the ValueTable. I've not adjusted the function for this yet, but in the meantime I added n a Note and an error for this.

HedvigS commented 1 year ago

@xrotwang just reminded me of datasets like APiCs where it's possible to have more than one Value per Language_ID and Parameter_ID in the ValueTable. I've not adjusted the function for this yet, but in the meantime I added n a Note and an error for this.

Sometimes, all you gotta do is have a little break and a think. I thought out a solution while biking home from work, committing in a sec :)!

HedvigS commented 1 year ago

Okay, I've modified the function now so that if it encounters a ValueTable similar to the one in APiCS, it does something good. It can't handle all those kinds of situations, but it can handle APICS and others with the same columns in ValueTable.

xrotwang commented 1 year ago

I don't think that piling up special case handling in this function is a good idea. Now it handles the APiCS case, expecting non-CLDF-standard columns "Frequency" and "Confidence" with particular semantics. But APiCS is about pidgins and creoles - i.e. languoids where Glottolog classification is somewhat different to the other families and "collapsing to language level" less well specified as for other cases.

So, as I said elsewhere - I think "rgrambank" should be targeted to handling access to and analysis of Grambank and not striving for ill-defined generality.

HedvigS commented 1 year ago

I don't think that piling up special case handling in this function is a good idea. Now it handles the APiCS case, expecting non-CLDF-standard columns "Frequency" and "Confidence" with particular semantics. But APiCS is about pidgins and creoles - i.e. languoids where Glottolog classification is somewhat different to the other families and "collapsing to language level" less well specified as for other cases.

So, as I said elsewhere - I think "rgrambank" should be targeted to handling access to and analysis of Grambank and not striving for ill-defined generality.

Okay, I hear you.

To me, this function is now very useful and I'd love it if it's in rgrambank. I realise the APiCS handling is a bit hacky, but I'm okay with it at this stage and it doesn't interfere with other things I want to do.

HedvigS commented 1 year ago

I don't know anyone else who is developing code for this purpose, so I'll make a stab at it myself for cases I have encountered or can envisage in short term and then we'll see what @SimonGreenhill says.

HedvigS commented 7 months ago

I've merged in main and resolved conflicts and updated namespace.