grambank / rgrambank

R package to access and analyse Grambank's CLDF data
Apache License 2.0
3 stars 1 forks source link

fn vs df #14

Closed HedvigS closed 1 year ago

HedvigS commented 1 year ago

@SimonGreenhill Generally, do you prefer functions to take the file path of a cldf-table as an input argument or for the user to read that in before and give the function the df.

Specifically for the ParameterTable I'm a bit wary, I'd prefer the file path as an argument because that table is a bit tricky to read in, and I've written into the function make_theo_scores the best way I know of reading it (there's linebreaks and stuff that can mess up with other approaches).

xrotwang commented 1 year ago

What about a Grambank object? Basically whatever is returned by rcldf as dataset object?

Hedvig Skirgård @.***> schrieb am Mi., 30. Aug. 2023, 14:35:

@SimonGreenhill https://github.com/SimonGreenhill Generally, do you prefer functions to take the file path of a cldf-table as an input argument or for the user to read that in before and give the function the df.

Specifically for the ParameterTable I'm a bit wary, I'd prefer the file path as an argument because that table is a bit tricky to read in, and I've written into the function make_theo_scores the best way I know of reading it (there's linebreaks and stuff that can mess up with other approaches).

— Reply to this email directly, view it on GitHub https://github.com/grambank/rgrambank/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKDY2GHPQEYYTXUHSUDXX4XP7ANCNFSM6AAAAAA4EOKAJE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

HedvigS commented 1 year ago

What about a Grambank object? Basically whatever is returned by rcldf as dataset object? Hedvig Skirgård > schrieb am Mi., 30. Aug. 2023, 14:35: @SimonGreenhill https://github.com/SimonGreenhill Generally, do you prefer functions to take the file path of a cldf-table as an input argument or for the user to read that in before and give the function the df. Specifically for the ParameterTable I'm a bit wary, I'd prefer the file path as an argument because that table is a bit tricky to read in, and I've written into the function make_theo_scores the best way I know of reading it (there's linebreaks and stuff that can mess up with other approaches). — Reply to this email directly, view it on GitHub <#14>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKDY2GHPQEYYTXUHSUDXX4XP7ANCNFSM6AAAAAA4EOKAJE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Personally I don't like to use complex list objects that much, and I find it can scare other users. I'd prefer plain data frames or fns. But, people can point to a data frame inside an cldf-object, that should be fine as it'll just resolve to a data frame anyway

HedvigS commented 1 year ago

I've gone ahead and expected just data-frames now rather than rcldf objects or fns, I think I've change that everywhere now.

SimonGreenhill commented 1 year ago

I'm sorry but people need to learn how to use objects. The rcldf thing is just an object connecting a bunch of tables, because they belong together naturally. There's no impediment to people moving things out of the object, so if they need to they can:

gb <- cldf('..')
I_am_a_boring_dataframe <- db$tables$LanguageTable

We should also work off objects in memory (i.e. not filenames). And I'd suggest that if a function needs to access two tables then it should take the object. If it needs one table then it should probably work on a table level, unless it's uglier.

HedvigS commented 1 year ago

I'm sorry but people need to learn how to use objects. The rcldf thing is just an object connecting a bunch of tables, because they belong together naturally. There's no impediment to people moving things out of the object, so if they need to they can:

gb <- cldf('..')
I_am_a_boring_dataframe <- db$tables$LanguageTable

We should also work off objects in memory (i.e. not filenames). And I'd suggest that if a function needs to access two tables then it should take the object. If it needs one table then it should probably work on a table level, unless it's uglier.

I am happy to use objects, I just couldn't see the clear advantage here for many of the functions in rgrambank.

The question here in this thread was first if functions in rgrambank should take dfs or fns, the opinion seemed to be if that choice then dfs. Now this is another suggestion, making functions in rgrambank take cldf class objects instead of dfs of the tables. That'll require some re-writing.

Do you want me to change the rgrambank functions I've suggested so far to take cldf class objects instead of dfs of the tables straight?

I'm not convinced of the benefit if I'm honest. The only function I can think of that'll take two tables is language_level_df() and I believe that that function a) might be better in rcldf than rgrambank and b) should be able to take a LanguageTable from another cldf-dataset than the ValueTable because I do not believe that most cldf-datasets contains the necessary info, i.e. the Language_ID or Language_level_ID col.

HedvigS commented 1 year ago

12 was merged in with binarise() and language_level_df() not taking cldf class objects but just straight dfs of the ValueTable and LanguageTable. I take that as indication to not spend time re-writing them to take cldf class objects.