mitchelloharawild / vitae

R Markdown Résumés and CVs
https://pkg.mitchelloharawild.com/vitae/
1.23k stars 239 forks source link

Sorting on dates with NULL values #227

Open dhicks opened 1 year ago

dhicks commented 1 year ago

Brief description of the problem: When trying to sort on a csl_dates column with missing (NULL) values, the ordering is wrong. Replacing the NULL entry/ies with NA resolves the problem, but is tricky.

Diagnosis: This came up when I was trying to use bibliography_entries() with an Zotero export of a group that included some submitted but unpublished papers (so, no publication date). bibliography_entries() calls jsonlite::fromJSON(), which has a longstanding issue with assuming java null should be translated to R NULL: https://github.com/jeroen/jsonlite/issues/70.

Reprex:

library(vitae)
#> 
#> Attaching package: 'vitae'
#> The following object is masked from 'package:stats':
#> 
#>     filter

dates = structure(list(
    structure(list(`date-parts` = list(list(2020L))), class = "csl_date"),
    NULL, 
    structure(list(`date-parts` = list(list(2019L, 3L, 14L))), class = "csl_date"), 
    structure(list(`date-parts` = list(list(2016L, 12L, 22L))), class = "csl_date"), 
    structure(list(`date-parts` = list(list(2020L, 1L))), class = "csl_date")
    ), 
    class = c("csl_dates", "vctrs_vctr", "list"))

dates
#> <csl_dates[5]>
#> [1] 2020       NULL       2019-3-14  2016-12-22 2020-1
## The order is all wrong and the last entry has disappeared
dates[order(dates)]
#> <csl_dates[4]>
#> [1] 2019-3-14  NULL       2020       2016-12-22

## From <https://stackoverflow.com/questions/22870198/is-there-a-more-efficient-way-to-replace-null-with-na-in-a-list/49539022#49539022>
replace_x <- function(x, replacement = NA_character_) {
    if (length(x) == 0 || length(x[[1]]) == 0) {
        replacement
    } else {
        x
    }
}

## Presumably you could use an lapply here, but I can't be bothered to figure that out right now
fixed_dates = purrr::modify_depth(dates, 1, replace_x)
## Sorted correctly, with the missing value at the end
fixed_dates[order(fixed_dates)]
#> <csl_dates[5]>
#> [1] 2016-12-22 2019-3-14  2020       2020-1     NA

Created on 2022-12-30 with reprex v2.0.2