Closed fontikar closed 2 months ago
@ehwenk @dfalster
What do you think about this format for the helpfiles? I think it might be a big confusing to the user: https://lifecycle.r-lib.org/articles/stages.html But I like there is a visual badge to draw eye to details so users can learn that the function is no longer supported.
We've decided this format for the messaging that prompts when using package functions with an old version of AusTraits
@dfalster proposed that we needed to offer options to either update database or package @ehwenk suggested that it needed to explicitly mention AusTraits I offered suggested wording, icons via cli
> extract_dataset(at_three, "Falser_2003")
✖ "extract_dataset" no longer supports AusTraits version 3.0.2
ℹ You can either update to a newer version of the data using `load_austraits()` OR
ℹ Install an older version of the package
ℹ See https://github.com/traitecoevo/austraits for details.
I have hit an issue with deprecating the switch in trait_pivot_wider
.
trait_pivot_wider
uses the $traits
table, but check_compatibility
checks on the$metadata
table in the austraits
object.
OPTION 1: Should we update the trait_pivot_
functions to now work with austraits
? OR
OPTION 2: Should we add a new check_compatibility_traits
function for delineating versions using only the traits table?
I'm not too familiar with the differences in col names across database versions but below is a reprex showing the col names that vary between the versions.
library(austraits)
#> Loading required package: RefManageR
#> Thanks for showing interest in `austraits`! Please consider citing this package - citation('austraits')
at_three <- load_austraits(version = "3.0.2", path = "ignore/data/austraits/")
#> Downloading AusTraits to 'ignore/data/austraits/'
#> Loading data from 'ignore/data/austraits//austraits-3.0.2.rds'
at_four <- load_austraits(version = "4.2.0", path = "ignore/data/austraits/")
#> Downloading AusTraits to 'ignore/data/austraits/'
#> Loading data from 'ignore/data/austraits//austraits-4.2.0.rds'
at_six <- load_austraits(version = "6.0.0", path = "ignore/data/austraits/")
#> Downloading AusTraits to 'ignore/data/austraits/'
#> Loading data from 'ignore/data/austraits//austraits-6.0.0.rds'
setdiff(names(at_six),names(at_three))
#> [1] "locations" "schema" "metadata"
setdiff(names(at_six),names(at_four))
#> character(0)
Using col names I can delineate between version 6 and 3, but there are no differences on col names between version 6 and 4.
Any suggestions @ehwenk?
Path of least resistance is to replace the $traits
argument with austraits
object.
Thoughts @dfalster
Following on discussions with @ehwenk, option 1 tends to confuses users when to use austraits$traits or streamlining usage by replacing input argument to take austraits
object may be beneficial. This solution creates an output that is not the same as the input which may be confusing to user? (incoming list of 13 length, outgoing, 1 widened tibble). @fontikar suggest maybe tacking out the widened table to the austraits
object i.e. austraits$wide_traits
?
@ehwenk and I discussed whether we want to "close the loop" with pivotting wider? Do we want users to be able to revert back to longer after going wide? Should this format change be unidirectional?? How is this different to combined_tables
?
Below is the behaviour of current trait_pivot
functions
library(austraits)
#> Loading required package: RefManageR
#> Thanks for showing interest in `austraits`! Please consider citing this package - citation('austraits')
lite_three_wide <- austraits:::austraits_3.0.2_lite$traits |>
summarise_trait_means() |>
trait_pivot_wider()
lite_four_wide <- austraits:::austraits_4.2.0_lite$traits |> trait_pivot_wider()
lite_five_wide <- austraits:::austraits_5.0.0_lite$traits |> trait_pivot_wider()
# Go back to long format
back_to_long_three <- lite_three_wide |> trait_pivot_longer()
nrow(austraits:::austraits_3.0.2_lite$traits) == nrow(back_to_long_three)
#> [1] FALSE
back_to_long_four <- lite_four_wide |> trait_pivot_longer()
nrow(austraits:::austraits_4.2.0_lite$traits) == nrow(back_to_long_four)
#> [1] TRUE
back_to_long_five <- lite_five_wide |> trait_pivot_longer()
nrow(austraits:::austraits_5.0.0_lite$traits) == nrow(back_to_long_five)
#> [1] FALSE
nrow(austraits:::austraits_5.0.0_lite$traits)
#> [1] 116023
nrow(back_to_long_five)
#> [1] 149372
Created on 2024-09-10 with reprex v2.1.0
Thoughts team? @dfalster @ehwenk
OPTION 1: Should we update the traitpivot functions to now work with austraits?
Currently plot_locations
work with both austraits and traits table using:
if( is.null(dim(aus_traits)) ){
traits <- aus_traits$traits
} else{
traits <- aus_traits
}
But I think this is also technical debt that needs to be removed and we should work with austraits
list object?? Right?
Following on discussions with @ehwenk, option 1 tends to confuses users when to use austraits$traits or streamlining usage by replacing input argument to take
austraits
object may be beneficial. This solution creates an output that is not the same as the input which may be confusing to user? (incoming list of 13 length, outgoing, 1 widened tibble). @fontikar suggest maybe tacking out the widened table to theaustraits
object i.e.austraits$wide_traits
?
WIth regards to pivoting wider... I don't see any practical way to fully close the loop and re-pivot longer. trait_pivot_wider
explicitly drops 4 columns (Line 49: meta_data_cols <- c("unit", "replicates", "measurement_remarks", "basis_of_value")
, because these are often different for each trait measurement. One could come up with some complicated way to pack this information, but it would be a very busy column that would need to include the trait name before each set of information. I think that we're offering trait_pivot_wider as a way to see data with traits across different columns, but there shouldn't be the expectation that this is cyclical - and more than "extract". We also don't provide an "unextract" function. The question is then if there is any value in trait_pivot_longer
I've been thinking about plot locations, and I can see good reasons to retain the ability to use a single dataframe or a traits.build data object - there will be people who will use join.*
functions to create a wider table that they might assign to a different name - or they might use their own filter functions within the traits table. Many people might know about the relational structure, but then simply save their manipulated dataframe to be a tibble and it shouldn't be that they can't continue manipulating/using functions that don't actually depend on the relational structure.
But I do have to think about how to implement "hooks" once we don't have database$metadata
For plot_locations
there is nothing that inherently requires a traits.build formatted data object - in fact, it assumes that latitude/longitude have already been joined to a single table. So this function really doesn't need to use check_compatibility
. It just needs to confirm that the table has columns latitude (deg)
and longitude (deg)
- and whatever has been specified as containing the trait data. So it should be fine for people to use this with completely different dataframes and the message should not be about old/new versions about AusTraits so much as suggesting to people they merge on coordinates if they haven't yet.
@ehwenk
It just needs to confirm that the table has columns latitude (deg) and longitude (deg) - and whatever has been specified as containing the trait data. So it should be fine for people to use this with completely different dataframes and the message should not be about old/new versions about AusTraits so much as suggesting to people they merge on coordinates if they haven't yet.
I think the only difference between version <=3.0.2 and newer is this line
dplyr::select(location_name, `latitude (deg)`, `longitude (deg)`, !!feature)
In the old version location_name
is replaced with site_name
.
I think we can generalise the function using any_of
dplyr::select(any_of(location_name, `latitude (deg)`, `longitude (deg)`, !!feature))
That will remove the switch
, and I will add a check for latitude and longitude
This does feel like an entirely different issue though so probably another branch and pursued after the switch deprecation.
We also don't provide an "unextract" function. The question is then if there is any value in trait_pivot_longer
Just discussed with @dfalster - we all agree to deprecate. The value for users to revert their widened data back to long is not well known...if there is enough interest to revive it then we will revisit this later!
I can see good reasons to retain the ability to use a single dataframe or a traits.build data object - there will be people who will use join.* functions to create a wider table that they might assign to a different name
We will add the flexibility of $traits
or traits.build object
for traits_pivot_wider
as it has been done for plot_locations
.
The compatibility check for traits.build >= 1.0 for $traits will be based on the column names. At the moment the function checks for treatment_context_id
in the names of the $traits
table. Looks like there are a few I could use:
setdiff(names(at_six$traits), names(austraits_4.2.0_lite$traits))
[1] "repeat_measurements_id" "temporal_context_id" "plot_context_id" "treatment_context_id"
[5] "method_context_id"
@ehwenk does these sound reasonable for checking for new traits.build
objects?
I could take a look at the traits.build::db_trait_pivot_wider
and hybridise the two?
This may be a seperate issue
Currently, the R package serves many different versions of the database, each with their own idiosyncrasies ✨ but this means that a lot of our functions have to behave according to the version of the loaded version.
switch()
functionsjoin_*
plot_site/locations
plot_trait_beeswarm
as_wide_table
extract_*
trait_pivot_*
We need to include a deprecation message to notify the user that because of their version of the database is not supported by R package and to check out the documentation on backwards compatibility. Deprecation printing will be based off of
check_compatability()
from #93Some resources for deprecating functions:
Currently the tests are written to test subsets of each version of the database. These will also need to be torn down accordingly and only one test will be retained for each major function.
The
summarise_*
functions will be deprecated entirely, and won't be used with any database built by traits.build.1.0