ropensci / software-review

rOpenSci Software Peer Review.
294 stars 104 forks source link

pkgmatch: Find R Packages Matching Either Descriptions or Other R Packages #671

Open mpadge opened 2 weeks ago

mpadge commented 2 weeks ago

Submitting Author Name: Mark Padgham Submitting Author Github Handle: !--author1-->@mpadge<!--end-author1-- Repository: https://github.com/ropensci-review-tools/pkgmatch Version submitted: 0.4.2 Submission type: Standard Editor: !--editor-->@MargaretSiple-NOAA<!--end-editor-- Reviewers: TBD

Archive: TBD Version accepted: TBD Language: en


Package: pkgmatch
Title:  Find R Packages Matching Either Descriptions or Other R Packages
Version: 0.4.2
Authors@R: c(
    person("Mark", "Padgham", , "mark.padgham@email.com", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0003-2172-5265")),
    person("Davis", "Vaughan", , "davis@posit.co", role = c("ctb"))
    )
Description: Find R packages matching either descriptions or other R packages.
License: MIT + file LICENSE
URL: https://docs.ropensci.org/pkgmatch/,
    https://github.com/ropensci-review-tools/pkgmatch
BugReports: https://github.com/ropensci-review-tools/pkgmatch/issues
Imports:
    brio,
    checkmate,
    cli,
    curl,
    dplyr,
    fs,
    httr2,
    memoise,
    pbapply,
    Rcpp,
    rvest,
    tibble,
    tidyr,
    tokenizers,
    treesitter,
    treesitter.r,
    vctrs
Suggests:
    gert,
    hms,
    httptest2,
    jsonlite,
    piggyback,
    pkgbuild,
    rappdirs,
    roxygen2,
    testthat (>= 3.0.0),
    withr,
    knitr,
    rmarkdown
LinkingTo:
    Rcpp
Depends: R (>= 3.5.0)
NeedsCompilation: yes
Encoding: UTF-8
Language: en-GB
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

Data retrieval, because the package includes code to generate language model (LM) embeddings from all R packages retrieved from both CRAN and rOpenSci package repositories. Wrapper because LM embeddings are generated by wrapping interface to ollama software. Plus I've inserted a new, one-off category of "rOpenSci tools" for internal, staff-curated packages.

Beyond internal rOpenSci use, target audiences are (1) entirely general audience of those interested in searching R packages using either text or code input, and (2) package developers, who can use this package to identify similar packages or functions to code they might be working on.

No, not at all. There are to my knowledge two other R packages for interfacing with LMs: tidyllm and elmer. Both of these are general interfaces to LM API endpoints, while this package specifically uses LM outputs to identify best-matching packages.

Not applicable.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ropensci-review-bot commented 2 weeks ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 2 weeks ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 weeks ago

Checks for pkgmatch (v0.4.2)

git hash: f12ad732

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:------------|------:| |internal |base | 545| |internal |pkgmatch | 204| |internal |utils | 25| |internal |stats | 12| |internal |tools | 5| |imports |fs | 47| |imports |checkmate | 20| |imports |dplyr | 17| |imports |memoise | 13| |imports |treesitter | 8| |imports |httr2 | 7| |imports |pbapply | 5| |imports |brio | 2| |imports |rvest | 2| |imports |tibble | 2| |imports |tokenizers | 2| |imports |treesitter.r | 1| |imports |cli | NA| |imports |curl | NA| |imports |Rcpp | NA| |imports |tidyr | NA| |imports |vctrs | NA| |suggests |gert | 2| |suggests |jsonlite | 2| |suggests |hms | 1| |suggests |piggyback | 1| |suggests |httptest2 | NA| |suggests |pkgbuild | NA| |suggests |rappdirs | NA| |suggests |roxygen2 | NA| |suggests |testthat | NA| |suggests |withr | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |linking_to |Rcpp | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

lapply (46), data.frame (31), which (27), names (24), vapply (23), length (19), nrow (17), grep (16), c (15), list (15), paste0 (14), seq_len (12), character (11), gsub (11), as.integer (10), by (10), ncol (10), tryCatch (10), unname (10), url (10), order (9), grepl (7), colnames (6), for (6), integer (6), readRDS (6), unlist (6), version (6), basename (5), colSums (5), format (5), ifelse (5), raw (5), seq (5), seq_along (5), tempdir (5), all (4), as.Date (4), asNamespace (4), do.call (4), is.null (4), strsplit (4), system (4), attr (3), difftime (3), getOption (3), log (3), matrix (3), proc.time (3), read.dcf (3), sqrt (3), table (3), unique (3), as.character (2), cbind (2), floor (2), is.na (2), ls (2), match (2), min (2), nzchar (2), options (2), regmatches (2), rowSums (2), sum (2), units (2), any (1), apply (1), as.matrix (1), class (1), cut (1), drop (1), file (1), gregexpr (1), list.files (1), logical (1), mean (1), new.env (1), parseNamespaceFile (1), paste (1), rank (1), rbind (1), readline (1), regexpr (1), rep (1), sort (1), switch (1), Sys.Date (1), Sys.getenv (1), Sys.time (1), system.file (1), tolower (1), unclass (1), vector (1)

pkgmatch

bm25_tokens_list (8), get_embeddings (7), not_null_index (7), bm25_idf (6), get_pkg_text (6), get_pkg_code (5), pkgmatch_bm25 (5), cosine_similarity (4), dl_prev_data (4), pkgmatch_embeddings_from_pkgs (4), rm_fns_from_pkg_txt (4), bm25_tokens (3), get_all_fn_descs (3), get_cache_file_name (3), get_embeddings_from_ollama (3), jina_model (3), pkgmatch_bm25_from_idf (3), pkgmatch_load_data (3), pkgmatch_treesitter_fn_tags (3), append_cols (2), attach_ns (2), bm25_idf_internal (2), bm25_tokens_internal (2), bm25_tokens_list_internal (2), days_in_this_month (2), dl_one_tarball (2), extract_tarball (2), get_calls (2), get_calls_in_functions (2), get_embeddings_intern (2), get_fn_defs_namespace (2), get_fn_descs_from_ns (2), get_local_pkg_dep_fns (2), get_local_pkg_deps (2), get_pkg_readme (2), get_pkg_text_internal (2), get_pkg_text_namespace (2), input_is_pkg (2), is_docker_sudo (2), is_windows (2), list_new_cran_updates (2), load_data_internal (2), m_list_remote_files (2), ollama_dl_jina_model (2), opt_is_quiet (2), pkg_fns_from_r_search (2), pkg_fns_from_r_search_internal (2), pkg_is_installed (2), pkg_name_from_path (2), pkgmatch_bm25_fn_calls (2), pkgmatch_bm25_fn_calls_internal (2), pkgmatch_bm25_from_idf_internal (2), pkgmatch_bm25_internal (2), pkgmatch_cache_path (2), pkgmatch_update_cran (2), append_data_to_bm25 (1), append_data_to_embeddings (1), append_data_to_fn_calls (1), apply_col_names (1), attach_base_rcmd_ns (1), attach_local_dep_namespaces (1), attach_this_pkg_namespace (1), convert_paths_to_pkgs (1), desc_template (1), extract_data_from_local_dir (1), fn_names_base (1), fn_names_rcmd (1), get_fn_defs_local (1), get_pkg_exported_fns (1), get_pkg_text_local (1), has_ollama (1), has_ollama_docker (1), has_ollama_local (1), head.pkgmatch (1), input_is_path (1), input_mentions_functions (1), make_cran_version_column (1), modify_by_lm_prop (1), ollama_check (1), ollama_has_jina_model (1), ollama_is_running (1), ollama_models (1), order_output (1), pkg_install_path (1), pkgmatch_browse (1), pkgmatch_cache_update_interval (1), pkgmatch_dl_data (1), pkgmatch_embeddings_from_text (1), pkgmatch_rerank (1), pkgmatch_similar_fns (1), pkgmatch_similar_pkgs (1), pkgmatch_update_data (1), pkgmatch_update_ropensci (1), rcmd_pkgs (1), rcpp_bm25 (1), registry_daily_chunk (1), rename_files_in_r (1), ros_registry (1), similar_pkgs_from_pkg (1), similar_pkgs_from_pkg_internal (1), similarity_embeddings (1), tok_lists_to_idfs (1), tressitter_calls_in_package (1)

fs

path (19), dir_ls (9), path_temp (7), dir_create (5), path_ext (3), file_exists (1), file_info (1), path_ext_set (1), path_real (1)

utils

installed.packages (4), lsf.str (4), data (3), packageDescription (3), prompt (3), tar (2), untar (2), browseURL (1), getFromNamespace (1), tail (1), timestamp (1)

checkmate

assert_character (7), assert_integerish (3), assert_matrix (2), assert_names (2), assert_numeric (2), check_file_exists (2), assert_list (1), assert_logical (1)

dplyr

left_join (8), rename (3), mutate (2), last_col (1), n (1), relocate (1), summarise (1)

memoise

memoise (13)

stats

dt (5), start (3), end (2), line (2)

treesitter

query_captures (3), node_text (2), parser (1), parser_parse (1), tree_root_node (1)

httr2

req_headers (2), request (2), resp_body_json (2), req_perform (1)

pbapply

pblapply (5)

tools

parse_Rd (2), Rd_db (2), CRAN_package_db (1)

brio

read_lines (2)

gert

git_clone (2)

jsonlite

read_json (2)

rvest

html_table (1), read_html (1)

tibble

new_tibble (2)

tokenizers

count_words (1), tokenize_words (1)

hms

hms (1)

piggyback

pb_download (1)

treesitter.r

language (1)

**NOTE:** Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in C++ (5% in 2 files) and R (95% in 20 files) - 1 authors - 6 vignettes - no internal data file - 17 imported packages - 14 exported functions (median 14 lines of code) - 218 non-exported functions in R (median 12 lines of code) - 4 R functions (median 12 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 20| 79.8| | |files_src | 2| 79.5| | |files_vignettes | 6| 96.8| | |files_tests | 10| 87.4| | |loc_R | 1905| 81.1| | |loc_src | 91| 13.6| | |loc_vignettes | 463| 74.2| | |loc_tests | 694| 77.2| | |num_vignettes | 6| 97.6|TRUE | |n_fns_r | 232| 90.5| | |n_fns_r_exported | 14| 56.0| | |n_fns_r_not_exported | 218| 93.0| | |n_fns_src | 4| 21.1| | |n_fns_per_file_r | 7| 79.5| | |n_fns_per_file_src | 2| 27.8| | |num_params_per_fn | 2| 8.2| | |loc_per_fn_r | 12| 36.8| | |loc_per_fn_r_exp | 14| 33.6| | |loc_per_fn_r_not_exp | 12| 39.8| | |loc_per_fn_src | 12| 38.9| | |rel_whitespace_R | 24| 85.6| | |rel_whitespace_src | 26| 21.8| | |rel_whitespace_vignettes | 20| 57.6| | |rel_whitespace_tests | 21| 77.1| | |doclines_per_fn_exp | 28| 29.2| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 187| 87.0| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/ropensci-review-tools/pkgmatch/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci-review-tools/pkgmatch/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |-----------:|:--------------------|:----------|:------|----------:|:----------| | 11727438106|docker |skipped |f12ad7 | 23|2024-11-07 | | 11727438100|pkgcheck |NA |f12ad7 | 96|2024-11-07 | | 11727438103|R-CMD-check |success |f12ad7 | 292|2024-11-07 | | 11727438110|test-coverage |success |f12ad7 | 292|2024-11-07 | | 11727438101|Update pkgmatch data |NA |f12ad7 | 66|2024-11-07 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 79.93 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following function have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_pkg_readme | 17 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!


Package Versions

|package |version | |:--------|:--------| |pkgstats |0.2.0.47 | |pkgcheck |0.1.2.63 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

emilyriederer commented 2 days ago

@ropensci-review-bot assign @MargaretSiple-NOAA as editor

ropensci-review-bot commented 2 days ago

Assigned! @MargaretSiple-NOAA is now the editor