ropensci / ritis

Integrated Taxonomic Information Service (ITIS) R client
https://docs.ropensci.org/ritis
Other
16 stars 2 forks source link

Make output column capitalization consistent with ITIS API for functions using `pick_cols()` #25

Open sformel-usgs opened 1 year ago

sformel-usgs commented 1 year ago

This is small potatoes, but I thought it couldn't hurt to mention. And mostly I'm curious about why this is happening.

I had reason to mix search_scientific and scientific_name for a large list of taxa and wanted to join the results on combinedName. But search_scientific returns a column named combinedName, while scientific_name returns a field called combinedname (lowercase n). After closer inspection, the unitname fields are also inconsistently capitalized. So the request is that these functions return the same names so the tables can be easily joined.

But I couldn't find where the names with the lowercase n were being defined. The scientific_name function explicitly names combinedName. The capital N seems to be consistent with what the ITIS API returns and what the full record function returns (see below).

I reinstalled from github, in case I had overlooked some commit, but the mystery remains. Is this by design, or am I overlooking something simple?


#reprex

>ritis::search_scientific(x = "Spartina alterniflora")

# A tibble: 3 × 12
  author                     combinedName                  kingdom tsn   unitI…¹ unitI…² unitI…³ unitI…⁴ unitN…⁵ unitN…⁶ unitN…⁷ unitN…⁸
  <chr>                      <chr>                         <chr>   <chr> <lgl>   <lgl>   <chr>   <lgl>   <chr>   <chr>   <chr>   <lgl>  
1 Loisel.                    Spartina alterniflora         Plantae 41267 NA      NA      NA      NA      Sparti… altern… NA      NA     
2 (Muhl. ex Elliott) Fernald Spartina alterniflora var. g… Plantae 5408… NA      NA      var.    NA      Sparti… altern… glabra  NA     
3 (Merr.) Fernald            Spartina alterniflora var. p… Plantae 5408… NA      NA      var.    NA      Sparti… altern… pilosa  NA     
# … with abbreviated variable names ¹​unitInd1, ²​unitInd2, ³​unitInd3, ⁴​unitInd4, ⁵​unitName1, ⁶​unitName2, ⁷​unitName3, ⁸​unitName4

>ritis::scientific_name(tsn = 41267)

# A tibble: 1 × 4
  combinedname          tsn   unitname1 unitname2   
  <chr>                 <chr> <chr>     <chr>       
1 Spartina alterniflora 41267 Spartina  alterniflora

> ritis::full_record(tsn = 41267)[['scientificName']]
$author
[1] "Loisel."

$class
[1] "gov.usgs.itis.itis_service.data.SvcScientificName"

$combinedName
[1] "Spartina alterniflora"

$kingdom
NULL

$tsn
[1] "41267"

$unitInd1
NULL

$unitInd2
NULL

$unitInd3
NULL

$unitInd4
NULL

$unitName1
[1] "Spartina                           "

$unitName2
[1] "alterniflora"

$unitName3
NULL

$unitName4
NULL
Session Info ```r >sessionInfo() R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ritis_1.0.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.10 fansi_1.0.4 utf8_1.2.3 dplyr_1.1.0 plyr_1.8.8 crul_1.3 R6_2.5.1 [8] jsonlite_1.8.4 lifecycle_1.0.3 magrittr_2.0.3 pillar_1.8.1 rlang_1.0.6 cli_3.6.0 curl_5.0.0 [15] rstudioapi_0.14 xml2_1.3.3 vctrs_0.5.2 generics_0.1.3 urltools_1.7.3 tools_4.2.1 glue_1.6.2 [22] triebeard_0.3.0 compiler_4.2.1 pkgconfig_2.0.3 tidyselect_1.2.0 tibble_3.1.8 solrium_1.2.0 httpcode_0.3.0 ```
maelle commented 1 year ago

@jcblum

jcblum commented 11 months ago

Hi @sformel-usgs, thanks for pointing this out!

Perhaps you figured this out on your own already 😅, but the reason that scientific_name() is returning lowercase column names is because it uses the internal pick_cols() generic function (specifically, its .data.frame method) to pull out a subset of columns from what ITIS returns. See: https://github.com/ropensci/ritis/blob/95f190cd328c80e75aa9619f04db7cd1016f7e9c/R/zzz.R#L75C1-L82

pick_cols() is used all over the codebase, so I'm guessing that the original motivation behind lowercasing was to make this utility as flexible as possible.

I agree that it would be nicer for the field names to have consistent capitalization no matter which path you took to getting the information. As you point out, that makes it easier to join outputs from different functions. I can also see the merit in making the capitalization consistent with the ITIS API.

However, since pick_cols() is such a widespread piece of infrastructure in this package, rethinking its behavior is a bit of an undertaking. I'm tagging this as a feature request, but I likely won't be able to get to it super soon (contributions welcome!).

sformel-usgs commented 11 months ago

Excellent detective work! I did not figure this out yet; I appreciate you following up on it. I don't have time to work on this in the next couple months, but I'll put it in the queue for the new year.