Closed mcguinlu closed 4 years ago
@mcguinlu and @L-ENA, thanks for your submission. I'll be the editor. As we move though the process I'll keep you posted. I welcome your questions any time.
\@mcguinlu, thanks again for your submission. The editor checks flagged a few issues that need your attention; see them below.
Let's discuss the first two items (ml1 and ml2) before I search for reviewers; these two items refer to a potential overlap with existing packages.
[x] (ml1) \@mcguinlu, I see you discussed this issue with @sckott (https://github.com/ropensci/fulltext/issues/213#issue-574066182); but this was a while ago and the issue remains unresolved and open. As of today, what do you think is the best way to move forward?:
a. Extend the package fulltext (https://github.com/ropensci/fulltext). b. Continue with this submission (please argue for the lack of overlap). c. Something else (please explain).
[x] (ml2) The file DESCRIPTION links to https://api.biorxiv.org/. \@mcguinlu, is there an overlap with the package biorxivr (https://cran.r-project.org/web/packages/biorxivr/index.html)? Why not?
The remaining items are important but not as urgent as the first two.
spelling::spell_check_package()
; then fix or update the list of valid words with spelling::update_wordlist()
.> spelling::spell_check_package()
WORD FOUND IN
api mx_api_content.Rd:39,42
mx_api_doi.Rd:28,31
description:4
AppVeyor README.md:14
README.Rmd:24
ation building-complex-search-strategies.Rmd:110
biorxiv description:4
capitalisation building-complex-search-strategies.Rmd:125
... more lines
goodpractice::gp()
to identify lines of code that tests don't touch.> goodpractice::gp()
... more lines
── GP medrxivr ─────────────────────────────────────────────────────────────────
It is good practice to
✖ write unit tests for all functions, and all package code in
general. 77% of code lines are covered by test cases.
R/mx_crosscheck.R:50:NA
R/mx_download.R:25:NA
R/mx_download.R:27:NA
R/mx_download.R:28:NA
R/mx_download.R:30:NA
... and 51 more lines
covr::package_coverage()
and try to test code in files with low % coverage.> covr::package_coverage()
medrxivr Coverage: 77.60%
R/mx_download.R: 1.92%
R/mx_crosscheck.R: 96.15%
R/mx_search.R: 96.33%
R/mx_api.R: 100.00%
R/mx_info.R: 100.00%
> styler::style_pkg()
Styling 12 files:
R/medrxivr.R ✓
R/mx_api.R ℹ
R/mx_crosscheck.R ℹ
R/mx_download.R ℹ
R/mx_info.R ℹ
R/mx_search.R ℹ
tests/testthat.R ✓
tests/testthat/test-api.R ℹ
tests/testthat/test-crosscheck.R ✓
tests/testthat/test-download.R ℹ
tests/testthat/test-info.R ✓
tests/testthat/test-search.R ℹ
────────────────────────────────────────
Status Count Legend
✓ 4 File unchanged.
ℹ 8 File changed.
x 0 Styling threw an error.
────────────────────────────────────────
Please review the changes carefully!
Reviewers: @tts and @njahn82 Due date: 2020-07-01
Hi @maurolepore
Thanks for your inital review of our package. I've gone through it and try to address each point below:
ml1: Overlap with fulltext
Personally, I think that fulltext and medrxivr should continue to be two seperate packages, but that there is the potential for intergration between the two (as occured with the aRxiv
package). In all honesty, I completely forgot to reply to the issue on fulltext
🤦♂️- sorry @sckott! The reason I think they should be seperate is two fold:
In the first instance, the two packages take fundamentally different approaches to searching. At present, the medrxivr
workflow is to create a local copy of the whole medRxiv repository via the API (or maintained static snapshot), and then search it locally using the mx_search()
function to find relevant articles [i.e. all data -> local search -> results]. In comparison, fulltext
takes the same approach to search bioRxiv as the biorxivr
package does (note: medRxiv and bioRxiv are very similar, so I am using fulltext
's approach to bioRxiv for comparison here). They both paste the search (see this code line) on after the base URL (https://www.biorxiv.org/search
) and scrape the resulting page(s), essentially mimicing what would happen if you performed a search on the site itself [i.e. remote search -> results]. The search functionality offered by this approach is dependent on what the site itself offers, and so therefore is not as comprehensive as that offered by medrxivr
(e.g. you can't use regexes to define capitalisation/alternative spellings, or use the NEAR operator). In addition, using the search functionality of the site itself (i.e. by pasting the query onto the search/
path) has been shown to not be very reproducible/transparent, which is what originally motivated the development of medrxivr
.
Secondly, as far as I can tell (correct me if I am wrong @sckott), search strings for fulltext
do not vary based on the database searched - for example, if you use ft_search()
without specifying the source, it uses the same string to search every database. This would cause issues for anything beyond a simple search, as medrxivr
allows for advanced search strings that would not be compatible with other data sources. Based on this, my argument is that medrxivr
should be a standalone full package, and a simple restricted version of the medrxivr
search could be implemented in fulltext
, provided @sckott is happy to implement the medrxivr
workflow [i.e. download data then search] within fulltext
.
ml2: Overlap with biorxivr
Thanks for highlighting this. I did come across this package while developing medrxivr
- however, the last work on this package took place 5 years ago, before introduction of the API which you refer to in your query. In addition, while the base URL of the API contains "bioRxiv" (e.g. https://api.biorxiv.org/
), this is only because the same organisation (Cold Spring Harbour) is responsible for both repositories. The actually endpoint for the medRxiv API is https://api.biorxiv.org/details/medrxiv/[interval]/[cursor]/[format]
. Finally, as mentioned above, biorxivr
relies on the search functionality offered by the site (e.g. by pasting the query on after the search/
path) rather than performing the searches itself.
ml3: Spelling I've added this as an issue (https://github.com/mcguinlu/medrxivr/issues/4) and plan to address it soon.
ml4/ml5: Test coverage
In hindsight, I should have highlighted this as a potential sticking point in my inital submission. The single file which is dragging down the average coverage contains mx_download()
, which takes the dataframe of records identified by the user as relevant and downloads a PDF for each one. In fact, this function has a test suite (see https://github.com/mcguinlu/medrxivr/blob/master/tests/testthat/test-download.R), but because it manipulates files and folders on a users machine, I have these tests set to skip_on_CRAN, which in turn means that covr
doesn't pick them up. This is something I was hoping to get feedback on during the review process, as I am not sure if this is best practice or what a viable alternative would be?
ml6: styler I've added this as an issue (https://github.com/mcguinlu/medrxivr/issues/5) and plan to address it soon.
Hopefully this addresses your inital concerns, but please do let me know if anything is unclear, if my responses are insufficient, or if you need further details!
Thanks @mcguinlu! I think {medrxivr} merits to move to the next stage. I'll now start searching for reviewers.
Here is my conclusion. I base it on your answers above, and on this quote from rOpenSci's guidelines on overlap:
"An R package that replicates the functionality of an existing R package may be considered for inclusion in the rOpenSci suite if it significantly improves on alternatives in any repository (RO, CRAN, BioC) by being ... better in usability and performance".
I considered the packages {medrxivr}, {fulltext} and {biorxivr}. I see an overlap in what the packages aim to do but not in how they do it. This would not justify the overlap in general; but in this case I think it does.
Compared to the other packages, {medrxivr} searches locally. This ensures the results can be reproduced; and enables searching with regular expressions. The different approach to searching also means that to integrate {medrxivr} into the other packages seems challenging. This might be eventually possible, but first {medrxivr} may need to mature independently.
@mcguinlu, please let me know or check the boxes as you address these issues.
@mcguinlu, I see the positive aspects of the "local" approach to searching that {medrxivr} implements; but I understand that {medrxivr} downloads the entire database. I worry this may not scale up. Here are some questions I have; you may discuss them directly with the reviewers:
Maybe you can avoid downloading the database and still provide flexible queries. For example, see how pkgsearch::advanced_search()
does it -- apparently it uses elastic, which supports regular expressions, wildcards, fuzziness, and more). If you implement something similar you might get some guidance from @maelle; she is one of the two developers of pkgsearch::advanced_search()
.
@mcguinlu, please do this (from these guidelines):
add a rOpenSci review badge to their README, via
rodev::use_review_badge()
,rodev::use_review_badge(<issue_number>)
. Badge URL ishttps://badges.ropensci.org/<issue_id>_status.svg
. Full link should be:
[![](https://badges.ropensci.org/<issue_id>_status.svg)](https://github.com/ropensci/software-review/issues/<issue_id>)
(ml8) @mcguinlu have you suggested any reviewer?
(The editor guidelines suggest you might but I fail to find them here or at https://github.com/ropensci/software-review/issues/369.)
Thanks @tts and @njahn82 for accepting to review this package. Your reviews are due on 2020-07-01 -- that is three weeks from today.
Let me know any questions you have.
Hi @maurolepore and @mcguinlu - here is my review. Thanks for this opportunity, and all the best for the package!
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 10
medRxiv has been accepting preprints for a year now. Their API does not offer any search capabilities, so clearly medrxivr
has a function to fill. Regular expressions and Boolean logic are state-of-the-art ways to fine-tune search queries, so users of this package should be happy. In addition, you can download all searched and found preprints as PDF files, which is handy and helpful.
Although the target group and the goal of the package are clearly defined, it took me some time to understand the core functionality. I suppose the main reason for this is the varying terminology of data sources used in vignettes and help pages. The way I understand the logic looks like this:
In short, for a search target there are two options, the dataset I download myself from medRxiv, or the dataset provided by the GitHub repo. The former can be either all items or just a subset limited by date. The latter is all items. Technically speaking, my download uses the medRxiv API, but the dataset in the repo is built by scraping the medRxiv web site on a daily basis. My understanding is that the main reasons for the scraped dataset are to provide a reliable data source for those occasions when the API does not serve well or not at all, and lighten the burden of the API usage.
How long does it take to download all metadata from the API? I tested it from two physical locations with a differing bandwidth:
start_time <- Sys.time()
medrxiv_data <- mx_api_content()
end_time <- Sys.time()
end_time - start_time
Time difference of 1.434701 mins (1Gb/s line, work)
Time difference of 3.045589 mins (28Mb/s line, home)
So far this is not bad, especially if you run the function once a day. One minor thing: is there any way to gracefully stop the process if started by accident? When the RStudio's red Stop button is hit, the following error is thrown
Error in curl::curl_fetch_memory(url, handle = handle): Operation was aborted by an application callback
Request failed [ERROR]. Retrying in 1 seconds...
httr::RETRY
is a new function to me. Thanks for this, I will definitely try to use it myself at some point. I wonder though if it allows a clean, user-friendly, forced exit and if yes, how should it be defined?
How rapidly can we expect medRxiv to grow? Looking back, the amount of submissions accelerated when the still very much prevailing COVID-19 pandemic began.
library(tidyverse)
library(medrxivr)
mx_data <- mx_api_content()
stats <- mx_data %>%
mutate(date = as.Date(date)) %>%
group_by(date) %>%
summarise(count = n())
png(filename="medrxivstats.png",
units="cm",
width=20,
height=20,
pointsize=12,
res=72)
qplot(x=date, y=count,
data=stats, na.rm=TRUE,
main="medRxiv item growth",
xlab="Date of submission", ylab="Number of submissions")
dev.off()
Search is a key component of this package, and vignettes help in building search queries. The medrxivr one shows how to use the mx_search
function: either within a two-step process, or with a one-step or piped process. The examples are a little confusing though because the functions shown are not the same; the first example uses mx_api_content
, the second one mx_api
which does not exists. I suppose mx_api
is a typo, maybe the name of a former version? The vignette building-complex-search-strategies shows several strategies to filter data, and also how to use regular expressions. Very helpful. One minor thing about this example
mx_results <- mx_search(query = "dementia",
NOT = "mild cognitive impairment")
The NOT argument does not match to Mild cognitive impairment which is found in one abstract, so perhaps better to use the form of [mM]ild cognitive impairment
instead.
In mx_search
, the data
argument is important because it defines the target. Again, the example in the help file is slightly misleading because there is no mx_raw
function. A former version this one too I presume?
When I ran mx_search
with zero arguments, my first thought was that there are some issues with error handling. The query starts but clearly you need to include the search string too! However, after some time the error handling kicks in and correctly reminds me of the missing query
argument. If I am not mistaken, the delay was caused by the latency of the default data source in the GitHub repository.
As of writing this, how long does it take to query the repo?
start_time <- Sys.time()
mx_results <- mx_search(query = "molecular")
end_time <- Sys.time()
(end_time - start_time)
Using medRxiv snapshot - 2020-06-27 06:01
Found 226 record(s) matching your search.
Time difference of 20.75107 secs
To me this is acceptable, but people of today tend to be impatient. Still, when the same search against my local copy of the medRxiv database takes only 0.5 secs, you begin to wonder which one to use. I noticed that the question of how to efficiently host and serve a dataset is something you and the editor have already discussed about. Unfortunately, I cannot give any advice, but am very much interested to learn about this topic too. I hope you will find a good solution.
Downloading PDFs works smoothly and as promised. Note: the mx_download
help file example of mx_search
uses a limit
argument which is not defined.
The Shiny application that comes with the package is a beautiful piece of work, and the idea of delivering reproducible code is a nice one indeed. However, there are some issues with the code. Both the basic and advanced search codes throw an error when run in R.
Basic:
query <- "coronavirus"
mx_results <- mx_search(query)
Error: $ operator is invalid for atomic vectors
Advanced:
topic1 <- c("coronavirus")
topic2 <- c("airborne")
query <- list(topic1, topic2)
mx_results <- mx_search(query, from.date =20190101, to.date =20200628, NOT = c(""), deduplicate = TRUE)
Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "list"
I was noted by @maurolepore that the package includes also a short manuscript to be submitted to Journal of Open Source Software. I found the manuscript in the inst
directory, read it, and found it to be both clear and concise. Good luck!
Hi @tts,
Just a short note to say thanks so much for your review. I've given it a quick skim, and it seems that everything you propose will be straightforward to implement. I'll go through your comments systematically soon, and post a response/list of changes. (@maurolepore, a process question - is it better for me to wait until the second reviewer has filed their review before beginning to make changes?)
Thanks in particular for spotting the discrepancies across the package (old function names in the examples, missing definitions for arguments, problems with the code from the app). You are correct that there is some hangover from an earlier version of the package/early versions of the package functions - I thought I had caught them all, but obviously not! When I started developing medrxivr
, the medRxiv API didn't exist, meaning the data
argument of mx_search()
was not required. To note, this is also what's causing the reproducible code from the Shiny app to fail, as under the new version of the function, mx_search(query)
is read as mx_search(data = query)
.
One specific thing I wanted to follow-up on was that the "Automated testing" item in the reviewer checklist is not marked as complete - did you have any specific issues with/reccomendations for this area of the pacakge?
@tts, thanks for your wonderful review!
@mcguinlu, RE
"Is it better for me to wait until the second reviewer has filed their review before beginning to make changes?"
Both reviewers should work on the exact same package. You may change the package in a separate branch, but please only merge it after both reviewers submitted their review.
One specific thing I wanted to follow-up on was that the "Automated testing" item in the reviewer checklist is not marked as complete - did you have any specific issues with/reccomendations for this area of the pacakge?
@mcguinlu Sorry, my bad. Both devtools::check()
and devtools::test()
ran without errors. Checked that item in the list.
@njahn82, I hope you are well. Could you please update us about your review?
Sorry, I didn't meet my review deadline. Will submit it by Wednesday. Thanks for your patience!
On Thu, 2 Jul 2020 at 03:28, Mauro Lepore notifications@github.com wrote:
@njahn82 https://github.com/njahn82, I hope you are well. Could you please update us about your review?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/software-review/issues/380#issuecomment-652726542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM7YRTEKAQTRMHY73MYBBDRZPPEVANCNFSM4NMF3HMQ .
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [X] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 5 hours
This is very timely package that not just reflect the increasing popularity of open access preprints in Health Sciences, but also issues around finding and searching them. Although a growing suite of scholarly search engines make medRxiv preprints available, there seems to be no standard way to retrieve data from medRxiv thoroughly and systematically. Also finding full-texts is challenging, because medRxiv preprints are not made available via PubMed Central. Similiary, Crossref metadata, medRxiv's DOI registration agency, lack links to pdf full-texts.
Before I share my code review, I want to disclose that I neither have an academic background in Health Sciences nor have I been involved in systematic reviews as a librarian. I will therefore focus on more formal aspects of the package and its design.
The package contains functions to retrieve metadata from medRxiv, applying complex search strategies on a metadata snapshot, and download pdf full-texts. However, the source code repository contains a considerable amount of other functionality as well, which is outside of the R
directory and excluded from the package build in .Rbuildignore
:
app
comprises a nice-looking and useful Shiny app helping users to build queries and visually explore the results.data-extraction
has scripts and functions for fetching and validating data.There's also a link to (daily updated) data in an external GitHub repo, https://github.com/mcguinlu/medrxivr-data/, which is used in an exported R function.
My main concern with this approach is that dependencies, which are not part of the package, are loaded, and in one case installed. The code outside of the R
folder also lacks documentation using roxygen tags and tests, and there's some redundancy. I feel that R code not part of the {medrxivr} package build either needs to be factored out should be moved into the R/
directory.
In the following, I will focus on the functionality, which is part of the package build.
@import
and @importFrom
do not cover all external functions used.mx_search()
refers to a function called mx_raw()
, which is not part of the package.{r}
Is this intentional?paste
. httr::modify_url()
and the query
of httr::GET()
allow passing arguments to a API. Furthermore, {httr} provides helpful functionality to capture API errors more systematically than in the current implementation.mx_crosscheck()
does web scraping, which is fine according to the robots.txt. However, the requested crawl delay of 7 sec has been not implemented, yet.Here's the checking using {polite}
polite::bow("https://www.medrxiv.org/archive", force = TRUE)
#> <polite session> https://www.medrxiv.org/archive
#> User-agent: polite R package - https://github.com/dmi3kno/polite
#> robots.txt: 68 rules are defined for 1 bots
#> Crawl delay: 7 sec
#> The path is scrapable for this user-agent
Created on 2020-07-08 by the reprex package (v0.3.0)
mx_search()
returns a grouped tibble. Personally, I prefer to have an ungrouped tibble. The column date
is of type double, not date. mx_search()
: rOpenSci style guide recommends snake case for params (from.date
and to.date
)Here's a reprex using the vignette example, which took less than 2 second.
library(tidyverse)
library(europepmc)
ep_q <-
c('PUBLISHER:"medRxiv" AND (mendelian* AND (randomisation OR randomization))')
epmc_l <- europepmc::epmc_search(ep_q, "raw", limit = 10000)
#> 91 records found, returning 91
my_df <-
purrr::map_dfr(epmc_l, `[`, c("doi", "title", "abstractText"))
my_df %>%
filter_at(vars(abstractText, title), any_vars(
grepl(
"[Mm]endelian(\\s)([[:graph:]]+\\s){0,4}randomi([[:alpha:]])ation",
.
)))
#> # A tibble: 81 x 3
#> doi title abstractText
#> <chr> <chr> <chr>
#> 1 10.1101/2020… Cardiometabolic traits, seps… Objectives: To investigate wheth…
#> 2 10.1101/2020… The relationship between gly… Aims: To investigate the relatio…
#> 3 10.1101/2020… Modifiable lifestyle factors… Aims: Assessing whether modifiab…
#> 4 10.1101/2020… Influence of blood pressure … Objectives: To determine whether…
#> 5 10.1101/2020… Increased adiposity is prote… Background Breast and prostate c…
#> 6 10.1101/2020… Examining the association be… Background: We examined associat…
#> 7 10.1101/2020… Investigating the potential … Aim: Use Mendelian randomisation…
#> 8 10.1101/2020… Unhealthy Behaviours and Par… Objective: Tobacco smoking, alco…
#> 9 10.1101/2020… Exploring the causal effect … BACKGROUND: Hearing loss has bee…
#> 10 10.1101/2020… Genetically informed precisi… Impaired lung function is associ…
#> # … with 71 more rows
Created on 2020-07-08 by the reprex package (v0.3.0)
(Disclaimer: I maintain the {europepmc} package and I am curios to learn more about potential shortcomings using Europe PMC instead of a primary literature source. Because I also find it sometimes not very helpful when reviewers point to their own work, I do not expect you to consider this :-))
I think that's it from me! Thank you for making Health Science preprints more accessible and better discoverable! Happy to help further with the process!
Thanks @njahn82 for your review!
@mcguinlu, please aim to address the comments of both @tts and @njahn82 within the next 2 weeks.
Hi @mcguinlu. Sorry, while still playing with your app, I just realised that I was wrong and nothing is installed from the Shiny app. Please ignore this bit from the review.
@njahn82 Thanks a million for your detailed review! At a quick skim, everything you flag/recommend is fixable/implementable, and will definitely help to improve the functionality. I'm also looking forward to examing europepmc
further - to be honest, I was not aware that medRxiv preprints were indexed in Europe PMC.
@maurolepore Just confirming that I have seen this, and so am aiming to address the comments by 23rd July (at the latest).
Hi all (esp @maurolepore)
A brief message to let you know that I have most of the changes requested made, but due to external circumstances, I haven't yet finished off the small number of outstanding items. I'm now aiming to have it ready for re-review by Thursday week (6th August) at the very latest.
Very sorry for the delay, and hope this is okay!
That's okay. Thanks for letting me know.
To start, thanks once again to everyone for your constructive comments! I've gone through all the posts above and have tried to extract all feedback here so that I can address it systematically. - please let me know if I have missed anything. Comments are divided by commenter and topic for ease of reference, and are presented in bold text, with the response immediately below.
I have also acknowledged both @tts and @njahn82 as reviewers in the Description
, as I feel like your comments have added a great deal to the functionality/user-friendliness of the package. Please check your details just to make sure I have them correct!
[x] I've added a new function (mx_export()
) to save search results as a .bib file for import to reference management software (this functionality already existed for the app but hadn't been copied across to the package).
[x] I have seperated out importing the snapshot into a seperate function (mx_snapshot()
), rather than just using mx_search(data= NULL)
to indicate that you want to use the snapshot. I think this will help users be very clear about which data source they are using, and was influenced by feedback from both reviewers about being confused re: my description of the data sources.
[x] Due to the fact that the API endpoints are identical for both medRxiv and bioRxiv and the fact that all interactions with the API have been centralised into a single function (thanks to reviewers feedback), coupled with demand from others in my group/people emailing me about it, the mx_api_*()
functions now contain a server
argument, allowing users to specify which server (medRxiv or bioRxiv) they want to interact with. Downloading the whole bioRxiv database does take a while, but users seem happy to do so rather than having to perform the searches and download the results manually via the website (see here for an example). One thing I am worried about is whether to keep this as a happy bonus of the packages restructure following the reviews above, or whether to publicise it further/make it a key element of the package? Any advice/comments from reviewers/editors would be apprecited here!
These have all been completed:
[x] How big is the database? How fast does it grow? And how long does it take to download it in a range of reasonable conditions? What happens in a range of extreme conditions?
As mentioned by @tts, there has been a substantial surge in the size of the database thanks to the COVID-19 pandemic. @tts reported times of 1.4 and 3.1 mins to download the database. I am not sure what is meant by extreme circumstances, but happy to do more testing if needed!
[x] Is the process transparent and "polite" to the user?
I'm not 100% sure what was meant by this comment. Re: interaction with medRxiv, all information is now taken from the API (previously mx_crosscheck()
used web-scraping, but this is no longer the case) - is this what was meant?
mx_api_content()
to stop. Is this right? And maybe @njahn82 might have some clever ideas about this?vroom::vroom()
rather than read.csv()
, this step is now subtantially faster. Trying mx_search(query="molecular")
, I got an average search time of ~1 second (vs ~20 seconds previously, as per your review).start_time <- Sys.time()
mx_results <- mx_search(data = mx_snapshot(), query = "molecular")
end_time <- Sys.time()
(end_time - start_time)
Using medRxiv snapshot - 2020-08-05 06:02
Found 289 record(s) matching your search.
Time difference of 1.341 secs
[x] The examples are a little confusing though because the functions shown are not the same; the first example uses mx_api_content, the second one mx_api which does not exists. I suppose mx_api is a typo, maybe the name of a former version? // In mx_search , the data argument is important because it defines the target. Again, the example in the help file is slightly misleading because there is no mx_raw function. A former version this one too I presume?
This is completely my bad. I thought I caught all references to this old function name, but obviously didn't. All references to mx_api()
and mx_raw()
have been removed/replaced as necessary.
[x] One minor thing about this example . . . The NOT argument does not match to Mild cognitive impairment which is found in one abstract, so perhaps better to use the form of [mM]ild cognitive impairment instead.
Thanks for catching this - I've changed the example to reflect this.
[x] However, there are some issues with the code [produced by the app]. Both the basic and advanced search codes throw an error when run in R.
This was due to the fact that the data
argument now comes first in order to make it compatible with piping, meaning that the example code from the app was trying to pass the search terms to the data
argument. This has been corrected, and the reproducible code should now work.
[x] When I ran mx_search with zero arguments, my first thought was that there are some issues with error handling. The query starts but clearly you need to include the search string too! However, after some time the error handling kicks in and correctly reminds me of the missing query argument. If I am not mistaken, the delay was caused by the latency of the default data source in the GitHub repository.
I've added a check to make sure that the data
/query
arguments are not empty very early on (prior to the rate limiting step of reading data from the GitHub repo), meaning that it fails fast and gracefully if no data source/search terms are provided.
mx_search()
Error in mx_search() :
Please provide medRxiv data to search, accessed from either from either the mx_api_content(), or mx_snapshot() function.
[x] The README is very helpful to get started with the package. A brief description of what medRxiv is and a link to the preprint server would make the README more informative.
I've added a sentence to this effect and link to the repository to the opening paragraph of both the README and the main vignette. Please let me know if this doesn't cover it.
[x] Maybe the distinction between downloading a snapshot and searching the remote snapshot could be made a bit clearer. I first started to download the whole corpus, and then realised that there's already a snapshot that I can use instead. /// I love @tts sketch of the overall design. Maybe it can be adapted and re-used?
Hopefully by including the graphic @tts suggested and by cleaning the language used in the README/vignettes, this is addressed.
[x] There are three vignettes, which is great. Again, the general overview misses a sentence about what the preprint server medRxiv is about.
Addressed in the README comment above.
[x] Not all code chunks are rendered. Some are introduced with a blank between the ticks and {r}
Is this intentional?
To quote Olivander1:
No, no, definitely not.
Not exactly sure what happened here, but I think it is fixed now. Please let me know if this is not the case!
[x] There is a considerable duplication of code regarding the API call, which can make it hard to update the package in case of API changes. It would be good to have a single function for the API call. // URL paths are constructed using paste. httr::modify_url() and the query of httr::GET() allow passing arguments to a API. Furthermore, {httr}
provides helpful functionality to capture API errors more systematically than in the current implementation.
All interactions with the API have now been centralised in a collection of helper functions in R/helpers.R, which make use of httr::modify_url()
and httr::GET()
. Additionally, better/informative handling of API errors, using httr::stop_for_status()
, has also been implemented. Finally, due to the API being extremely unreliably over the past two weeks, an additional helper function skip_on_api_message()
[x] mx_crosscheck()
does web scraping, which is fine according to the robots.txt. However, the requested crawl delay of 7 sec has been not implemented, yet.
The mx_crosscheck()
function has been updated to make use of the API inteface rather than webscraping. This change was made as there is often a discrepancy between the API-provided total number of records and the total number on the website (which is what was originally used to compare against the snapshot). It also means that cross-checking between the snapshot and the live database is much faster, and can be used as an indicator of whether or not it is worth downloading your own copy via the API.
[x] mx_search() returns a grouped tibble. Personally, I prefer to have an ungrouped tibble. The column date is of type double, not date.
mx_search()
now returns an ungrouped tibble. The date column is now of type Date.
[x] Because of the downloading time, it is good to have feedback about the progress. Maybe re-using a progress bar functionality like from {progress} can lead to less code, while expanding the current feedback mechanism.
This has been implemented for mx_api_content()
, both to give users better feedback in terms of progress and to better estimate the remaining time needed to download a local copy of the database.
[x] mx_search(): rOpenSci style guide recommends snake case for params (from.date and to.date)
The argument names in both mx_search()
and mx_api_content()
have been updated to reflect this. In addition, I have made the format for specifying a date consistent between the two functions: "2020-06-01" (previously, mx_search()
used a numeric format: 20200601)
[x] Finally, I wonder, if Europe PMC could be of use for searching medRxiv. Europe PMC search syntax is quite extensive and supports Boolean operator, wildcards and controlled vocabularies. What are the reasons not using it for searching medRxiv? Is it an indexing lag, or lacking metadata?
Full disclosure : I was not aware that medRxiv preprints were captured by Europe PMC. However, on investigation, there seems to be two differences between searching medRxiv directly vs via europeomc
, using your example search.
europepmc
gives 8994 records total, while medrxivr gives 9146. medrxivr
, 107 europepmc
). Of these, one was published on the 01/06/2020, so it is maybe not surprising that it is not indexed yet (though preprints published on medRxiv after this date are), but one of the other records was registered on medRxiv in February (04/02/2020). I'm not sure why this has not been captured by PMC, but if you have any ideas, be great to hear them!One last reason (and one of the inital motivating factors fordeveloping the package) was that medrxivr
allows you to search for/download multiple versions of the same preprint (mx_search(data, query, deduplicate = FALSE)
), allowing for comparison between them. As far as I can see, this functionality is not implemented in europepmc
(but please correct me if I am wrong!).
```{r} # Load packages ----------------------------------------------------------- library(tidyverse) library(europepmc) library(medrxivr) # Compare total records returned ------------------------------------------ # Using europepmc gives 8994 ep_q <- c('PUBLISHER:"medRxiv"') epmc_l <- europepmc::epmc_search(ep_q, "raw", limit = 10000) pmc_all <-purrr::map_dfr(epmc_l, `[`, c("doi", "title", "abstractText")) # Using medrxivr gives 9146 mx_all <- mx_snapshot() %>% mx_search(query = "*") # Compare searches -------------------------------------------------------- pattern <- "[Mm]endelian(\\s)([[:graph:]]+\\s){0,4}[Rr]andomi([[:alpha:]])ation" # Using europepmc gives 107 records pmc_results <- pmc_all %>% filter_at(vars(abstractText, title), any_vars( grepl( pattern, . ))) # Using medrxivr gives 110 records mx_results <- mx_snapshot() %>% mx_search(query = pattern, fields = c("title","abstract","doi")) # Find records found by medrxivr but not europepmc ----------------------- '%notin%' <- Negate('%in%') # Gives 3 records discrepancy_df <- mx_results %>% filter(doi %notin% pmc_results$doi) ```
[x] All tests passed, but it took a while. My duration was 1221.5 sec. However, I was connected to the internet via a cell phone connection during the review of the package.
This is likely due to the old way of reading in the data (i.e. via read.csv()
). Now that the package is using vroom
, testing should be a lot faster. I ran the testing a few times, and the average was ~ 140 s. From now on, the rate limiting step will be how fast it can download the copy from the database.
[x] I realised that a lot of skipping for CI platforms happens and I wonder why? Is it the run-time?
The skips were initially due to the fact that I assumed Travis/Appveyor wouldn't like it if I saved files to their systems, which is what the tests for mx_download()
did. However, I've tried it, and all tests work as expected, so I have removed these skips from the updated version.
1 Harry Potter and the Sorcerer's Stone (movie), 00:28:35.
@mcguinlu, thanks for your hard work. See my comments below.
@tts and @njahn82 please consider @mcguinlu 's changes and respond with either your approval or a suggestion for improvement.
--
spelling::spell_check_package()
still shows some unknown words. Please update your words list and
consider automating the process with usethis::use_spell_check()
.spelling::spell_check_package()
#> WORD FOUND IN
#> bioRxiv mx_api_content.Rd:5,35
#> Harbour mx_api_content.Rd:5,35
goodpractice::gp()
still suggests some improvements:suppressWarnings(goodpractice::gp())
...
─────────────────────────────────────────────────────────────────
#>
#> It is good practice to
#>
#> ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R
#> users and developers are used it and it is easier to read your code
#> for them if you use '<-'.
#>
#> tests/testthat/test-helpers.R:4:7
#> tests/testthat/test-helpers.R:20:7
#>
#> ✖ avoid long code lines, it is bad for readability. Also, many people
#> prefer editor windows that are about 80 characters wide. Try make
#> your lines shorter than 80 characters
#>
#> R/mx_crosscheck.R:24:1
#> R/mx_download.R:7:1
#> R/mx_search.R:37:1
#> tests/testthat/test-export.R:9:1
#>
#> ────────────────────────────────────────────────────────────────────────────────
covr::package_coverage()
shows greater coverage than before; thanks. The only file that's still
a little low is R/mx_crosscheck.R. Please consider adding more tests or excluding code as
necessary (https://github.com/r-lib/covr#exclusions).# > covr::package_coverage()
# medrxivr Coverage: 91.04%
# R/mx_crosscheck.R: 69.23%
# R/helpers.R: 83.33%
# R/mx_search.R: 90.27%
# R/mx_download.R: 93.85%
# R/mx_api.R: 100.00%
# R/mx_export.R: 100.00%
# R/mx_info.R: 100.00%
# R/mx_snapshot.R: 100.00%
usethis::use_tidy_style()
suggests some files could improve. Please run
usethis::use_tidy_style()
and consider committing the changes.usethis::use_tidy_style()
#> ✔ Setting active project to '/home/mauro/git/medrxivr'
#> Styling 17 files:
#> R/helpers.R ℹ
#> R/medrxivr.R ✔
#> R/mx_api.R ✔
#> R/mx_crosscheck.R ✔
#> R/mx_download.R ✔
#> R/mx_export.R ✔
#> R/mx_info.R ✔
#> R/mx_search.R ✔
#> R/mx_snapshot.R ✔
#> tests/testthat.R ✔
#> tests/testthat/test-api.R ✔
#> tests/testthat/test-crosscheck.R ✔
#> tests/testthat/test-download.R ✔
#> tests/testthat/test-export.R ✔
#> tests/testthat/test-helpers.R ℹ
#> tests/testthat/test-info.R ✔
#> tests/testthat/test-search.R ✔
#> ────────────────────────────────────────
#> Status Count Legend
#> ✔ 15 File unchanged.
#> ℹ 2 File changed.
#> ✖ 0 Styling threw an error.
#> ────────────────────────────────────────
#> Please review the changes carefully!
#>
#> ✔ Styled project according to the tidyverse style guide
ml7: On the website, the Reference tab shows "All functions". Maybe you can help users navigate this reference by grouping functions in some meaningful way? (see https://pkgdown.r-lib.org/reference/build_reference.html).
ml8: You may want to consider setting up a CI services for a wider range of environments. Here are two workflows you may use -- standard, and full.
ml9: I see three .Rmd files inside vignettes/ but only two in the Articles section
of the website. Is this itentional? Also, vignettes are great, but they can
make the installation heavier. Consider the difference between use_vignette()
and use_article()
.
ml10: I recommend walking through the steps listed by
use_release_issue()
or devtools::release()
.
Even if you don't submit to CRAN, walking through the process can help you find
details to improve.
ml11: The vignettes show code but not output. Reproducible examples are most useful
when they include the output because readers can understand what the code does
even if they choose not to run the code themselves. This is why reprex::reprex()
prints
output (https://reprex.tidyverse.org/).
Hi @mcguinlu and thanks for your efforts! Below, I'll use your headings, and give my remarks to each of them.
General comments. In addition to adding your suggested diagram, I have tried to make the language used across the documentation more consistent, but please do point out anything that could be clearer!
Great, much better now. I cannot find anything more to complain :)
API. having to hit "Esc"/click "Stop" multiple times in order to actually get mx_api_content() to stop. Is this right?
Yes, that's what I mean, and I understand your explanation. Several clicks do terminate the download, so I find this sufficient.
Snapshot. Trying mx_search(query="molecular")
Time difference of 6.319079 secs
for me, which is not bad.
Vignette/README
Ok now.
Download
Ok.
Shiny app mx_search()
Reproducible code works now fine, and a missing data | query
argument is caught right away. Good!
One new comment
mx_info(commit = "master")
Error in mx_info(commit = "master") : could not find function "mx_info"
Except this comment, I give my approval.
Thanks for the further feedback both (and Happy Friday)! Please find my responses to your comments below:
ml3: spelling::spell_check_package()
still shows some unknown words. Please update your words list and consider automating the process with usethis::use_spell_check()
.
I have automated the spellcheck now as recommended.
ml4: goodpractice::gp()
still suggests some improvements.
I have addressed all issues raised, and goodpractice::gp()
now does not recommend any further improvements.
ml5: covr::package_coverage()
shows greater coverage than before; thanks. The only file that's still a little low is R/mx_crosscheck.R. Please consider adding more tests or excluding code as necessary (https://github.com/r-lib/covr#exclusions).
I have added more tests to increase the coverage, and where it is not possible to test the error handling behaviour (e.g. because it's not possible to simulate the user not having an internet connection or the API returning a specific message), I have excluded lines as needed. The skipped lines are all marked with a #nocov
comment, so can be readily found for inspection. I've included the output of my local run of covr::package_coverage()
below:
medrxivr Coverage: 100.00%
R/helpers.R: 100.00%
R/mx_api.R: 100.00%
R/mx_crosscheck.R: 100.00%
R/mx_download.R: 100.00%
R/mx_export.R: 100.00%
R/mx_info.R: 100.00%
R/mx_search.R: 100.00%
R/mx_snapshot.R: 100.00%
ml6: usethis::use_tidy_style()
suggests some files could improve. Please run usethis::use_tidy_style()
and consider committing the changes.
I have run this and commited the changes.
ml7: On the website, the Reference tab shows "All functions". Maybe you can help users navigate this reference by grouping functions in some meaningful way? (see https://pkgdown.r-lib.org/reference/build_reference.html).
I had added keywords to the functions already, but hadn't realised that you needed to alter the _pkgdown.yml
file in order to group the functions. This has now been implemented, and functions are grouped into three categories: "Accessing medRxiv/bioRxiv data", "Performing the search", and "Helper functions".
ml8: You may want to consider setting up a CI services for a wider range of environments. Here are two workflows you may use -- standard, and full. Thanks for the recommendation - I have gone with the standard workflow, and R CMD passes in all environments.
ml9: I see three .Rmd files inside vignettes/ but only two in the Articles section of the website. Is this intentional? Also, vignettes are great, but they can make the installation heavier. Consider the difference between use_vignette()
and use_article()
.
Yes, this is intentional. When you include a .Rmd
file with the same name as the package in the vignette/
folder, pkgdown
treats this as a special type of vignette ("Get Started"). From the pkgdown
website:
A vignette with the same name as the package (e.g., vignettes/pkgdown.Rmd or vignettes/articles/pkgdown.Rmd) automatically becomes a top-level "Get started" link, and will not appear in the articles drop-down. (If your package name include a ., e.g. pack.down, use a - in the vignette name, e.g. pack.down.Rmd.)
I have also taken your advice and converted the two vignettes covering advanced topics to articles, and signposted to them in the final introductory vignette.
ml10: I recommend walking through the steps listed by use_release_issue()
or devtools::release()
. Even if you don't submit to CRAN, walking through the process can help you find details to improve.
As a result of this process, the following changes were made:
xml2
was removed from the DESCRIPTION
as it is now longer needed now that the package does not perform any web-scraping.pkgdown
function list is more useful.ml11: The vignettes show code but not output. Reproducible examples are most useful when they include the output because readers can understand what the code does even if they choose not to run the code themselves. This is why reprex::reprex()
prints output (https://reprex.tidyverse.org/).
Thanks for this feedback. I have decided not to produce output for the one remaining vignette, as the example code in this vignette calls the API via mx_api_content()
. I am worried that enabling evalutation of the code in this vignette would mean that it would take a long time to render and make installing the package slow. However, for the two new articles (converted from vignettes as per ml9, and included only in the pkgdown website), the output is now shown.
Glad to hear things are a bit clearer now!
The reason mx_info()
is not found is that it is an internal function (medrxivr:::mx_info()
) and should not have been available in the function list on the pkgdown
website. I had marked several internal function with the "Internal" keyword, which should have hidden them, but it seems that pkgdown
is case sensitive and the correct keyword is "internal". This has been corrected now and the internal functions now longer appear in the website's function list.
Finally, just wanted to confirm that your details in the DESCRIPTION are correct?
@mcguinlu Yes, my details in DESCRIPTION are correct.
@mcguinlu, just to let you know that I believe @njahn82 will respond to your changes next week.
Great - thanks for letting me know!
Great job @mcguinlu, and thank you for the careful and thorough consideration of my review. I feel, it is clearer now what the package does and how it relates to the Shiny app and the backup/dump mechanism.
Thank you also for cross-checking with Europe PMC and demonstrating the added value of the medrxivr package.
Although all my suggestions have been addressed, I have some final suggestions
I wonder if the returned data frames from the mx_api_*
family could be also represented as tibbles?
The package does a good job in parsing and cleaning preprint metadata. Unfortunately, I cannot find documentation or an example showcasing what is actually returned. Can you provide one reproducible example in the README and/or extend the documentation in the function docs?
In the function docs of mx_export()
, it says Dataframe returned by mx_search()
, but I realised that also data obtained from the mx_api_
family can be exported as bib file using mx_export()
.
Thanks @njahn82. Just to note as well that I recently moved the snapshot functionality from relying on my local Task Scheduler to working from GitHub Actions, so it should now be a lot more robust (in the past, if my local PC experienced network issues, the snapshot would not be taken).
In response to your comments:
*I wonder if the returned data frames from the mxapi() family could be also represented as tibbles?**
tibble
objects.The package does a good job in parsing and cleaning preprint metadata. Unfortunately, I cannot find documentation or an example showcasing what is actually returned. Can you provide one reproducible example in the README and/or extend the documentation in the function docs?
mx_api_*()
functions, which also points to a section in the API article on the pkgdown
website that provides more detail and an example of the uncleaned output. In addition, a clearer description of what the cleaning process entails has been included in the documention of the mx_api_*()
functions (e.g. here)In the function docs of mx_export(), it says Dataframe returned by mx_search(), but I realised that also data obtained from the mxapi family can be exported as bib file using mx_export().
mx_export()
function to read @param data Dataframe returned by mx_search() or mx_api_*() functions
@maurolepore, I have checked that these changes don't throw any new errors and that goodpractice
doesn't recommend any changes. I've also re-run styler
/spelling
functions and commited any modications.
Hoping we are nearly there!
Thanks @njahn82 and @mcguinlu,
@njahn82, once again, please consider @mcguinlu 's changes and respond with either your approval or further suggestions for improvement.
Thank you again @mcguinlu for your careful consideration of my review! All my suggestions have been addressed.
Approved! Thanks @mcguinlu for submitting and @tts and @njahn82 for your reviews! :smile:
To-dos:
pkgdown
website and are ok relying only on rOpenSci central docs building and branding,
pkgdown
website with a redirecting pagehttps://docs.ropensci.org/package_name
URL
field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname)
.From https://github.com/ropensci/software-review/issues/380#issue-625738088 I see you wish to automatically submit to the Journal of Open Source Software? If so:
This package has been reviewed by rOpenSci: https://LINK.TO/THE/REVIEW/ISSUE, @ropensci/editors
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"
-type contributors in the Authors@R
field (with their consent). More info on this here.
Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.
We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.
Hello @mcguinlu! I've just invited you to the @ropensci/medrxivr team! You should now be allowed to transfer the repo. Once you do, just ping me here and I'll transfer full admin rights back to you 🙂👍
Hi @annakrystalli have transferred across now. @maurolepore thanks for the checklist - I will work through it over the coming day. And finally, just flagging to @stefaniebutland that I would be interested in producing a blog post for this package!
Thanks again to @tts and @njahn82 for reviewing, and @maurolepore for herding us all through the process!
Thanks @mcguinlu ! Full admin rights now returned 👍
Has this review been completed? (I'm asking as the editor of the corresponding JOSS submission)
To-dos:
- [x] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. Soon you will be invited to a team that should allow you to do so. You'll be made admin once you do.
- [x] Fix all links to the GitHub repo to point to the repo under the ropensci organization.
[x] If you already had a
pkgdown
website and are ok relying only on rOpenSci central docs building and branding,
- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current
pkgdown
website with a redirecting page- replace your package docs URL with
https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the
URL
field alongside the link to the GitHub repository, e.g.:URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
- [x] Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname)
.- [x] We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
From #380 (comment) I see you wish to automatically submit to the Journal of Open Source Software? If so:
- [x] Activate Zenodo watching the repo
- [x] Tag and create a release so as to create a Zenodo version and DOI
- [x] Submit to JOSS at https://joss.theoj.org/papers/new, using the rOpenSci GitHub repo URL. When a JOSS "PRE REVIEW" issue is generated for your paper, add the comment:
This package has been reviewed by rOpenSci: https://LINK.TO/THE/REVIEW/ISSUE, @ropensci/editors
Okay, I've completed all the steps now @maurolepore! Re: the JOSS review, please see @danielskatz's comment above.
The one thing I wasn't clear on was how to replace the old pkgdown
website with a redirecting page - seeing as the repo has been transferred across, the old pkgdown
website on GitHub Pages (https://mcguinlu.github.io/medrxivr/index.html) no longer exists (I think?!), so I wasn't clear on how to set the redirect.
@danielskatz, thanks for checking. Yes, as the guest editor of this submission, I confirm this review has been completed.
@mcguinlu,
RE:
The one thing I wasn't clear on was how to replace the old pkgdown website with a redirecting page - seeing as the repo has been transferred across, the old pkgdown website on GitHub Pages (https://mcguinlu.github.io/medrxivr/index.html) no longer exists (I think?!), so I wasn't clear on how to set the redirect.
I'm sorry this isn't clear for you or me. But as you say, the working website seems correct. I see no reason to worry.
Here are a few more comments from section 8.1.4 of https://devguide.ropensci.org/:
Please check these boxes to confirm you've done the following last steps:
[ ] Add a CodeMeta file by running codemetar::write_codemeta() (codemetar GitHub repo)
[ ] Change any needed links, such those for CI badges
[ ] Re-activate CI services
[ ] If authors maintain a gitbook that is at least partly about their package, contact an rOpenSci staff member so they might contact the authors about transfer to the ropensci-books GitHub organisation.
[ ] Add a “peer-reviewed” topic to the repo (it seems I'm the one supposed to do this but I apparently lack the privileges to access the "topics" settings -- see if you can or let me know).
--
Ping me when this is done and I'll then close this issue.
Thanks!
@mcguinlu, I see you already mentioned Stephanie Butland above. To comply with https://devguide.ropensci.org/editorguide.html#after-review, I also mention @ropensci/blog-editors for follow-up about your willingness to write a blog post or tech note.
Finally, please see https://devguide.ropensci.org/editorguide.html#package-promotion
So in response to the last few bits:
[x] Add a CodeMeta file by running codemetar::write_codemeta() (codemetar GitHub repo)
CodeMeta file added (see here)
[x] Change any needed links, such those for CI badges
All CI badges updated to point to the ropensci endpoints (e.g see here)
[x] Re-activate CI services
Done, and have triggered a build under the new set-up to ensure everything works, which was successful.
[ ] If authors maintain a gitbook that is at least partly about their package, contact an rOpenSci staff member so they might contact the authors about transfer to the ropensci-books GitHub organisation.
Not applicable to me.
[x] Add a “peer-reviewed” topic to the repo (it seems I'm the one supposed to do this but I apparently lack the privileges to access the "topics" settings -- see if you can or let me know).
Done!
Thanks also for the additional materials re: CRAN submission (I do intend to submit to CRAN in the near future) and promotion, and for looping in@ropensci/blog-editors.
And I think that's us!
@mcguinlu , thanks and congratulations! To the best of my knowledge, this completes the review process so I'll close now.
--
You may already know this. To prepare packages for CRAN, usethis::use_release_issue()
is useful. And here are some aspects of the workflow that might help (including a link to some tweaks). Feel free to reach out with questions.
Are you in rOpenSci's Slack workspace? If not, I recommend you find someone who can add you. I have found friendly advice there that I wouldn't find anywhere else.
Hello @mcguinlu. We'd love to have a post about medrxivr.
Our Blog Guide has most of the information you should need, with both content and technical advice. For readers, it would be helpful to highlight how this package relates to similar ones and the specific niche that medrxivr fills. Once that's clear early in the post, your readers will give their attention.
Let me know when you'd like to submit a draft and I can suggest a publication date.
@mcguinlu Also let me know if you'd like a new invitation to rOpenSci Slack. We could move this discussion there for example.
@stefaniebutland a new invite would be great! I thought I had activated the first one correctly but apparently not (I am still getting to grips with Slack) 🤦♂️ and happy to continue chatting about this there.
Submitting Author: Luke McGuinness (@mcguinlu) Repository: https://github.com/mcguinlu/medrxivr
Version submitted: 0.0.2 Editor: @maurolepore Reviewer 1: @tts Reviewer 2: @njahn82 Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences): medrxivr allows users to programmatically access data from medRxiv, a preprint respository for papers in medical, clinical, and related health sciences. The package also allows user to readily perform and document reproducible literature searches of the medRxiv database.
Who is the target audience and what are scientific applications of this package?
The primary target of this package is systematic reviewers (i.e. me!), who frequently wish both to use more complicated queries (e.g. regular expresssions/Boolean combinations) when searching medRxiv than the official site currrently allows for, and who also wish to be easily able to download the full text PDFs of records matching their search.
medrxivr
helps with both of these challenges. However, anyone who wishes to find and retrieve relevant medRxiv records in R, for example to explore the distribution of preprints by subject area, will find the package useful.Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? As far as I am aware, no other package allows users to access medRxiv data in R.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted. Issue: https://github.com/ropensci/software-review/issues/369 Editor: @annakrystalli
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
JOSS Options
- [X] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [X] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [X] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.3860024 - (*Do not submit your package separately to JOSS*)MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct
Tagging my co-author @L-ENA for reference.