ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Submission: jstor #189

Closed tklebel closed 6 years ago

tklebel commented 6 years ago

Summary

The tool Data for Research (DfR) by JSTOR is a valuable source for citation analysis and text mining. jstor provides functions and suggests workflows for importing datasets from DfR.

Package: jstor
Title: Read Data from JSTOR/DfR
Version: 0.2.6
Authors@R: person("Thomas", "Klebel", email = "thomas.klebel@uni-graz.at", 
                  role = c("aut", "cre"))
Description: Functions and helpers to import metadata and full-texts delivered
    by Data for Research (DfR) by JSTOR. 
Depends: R (>= 3.1)
License: GPL-3 | file LICENSE
Encoding: UTF-8
LazyData: true
Imports: 
    dplyr,
    purrr,
    xml2,
    magrittr,
    stringr,
    readr,
    tibble,
    rlang,
    foreach,
    doSNOW,
    snow
Suggests: testthat,
    covr,
    knitr,
    rmarkdown,
    tidyr
BugReports: https://github.com/tklebel/jstor/issues
URL: https://github.com/tklebel/jstor, https://tklebel.github.io/jstor/
RoxygenNote: 6.0.1
Roxygen: list(markdown = TRUE)
VignetteBuilder: knitr

The package should fit very well into the data extraction-category, because it extracts data from *.xml-files for further use.

The target audience would be scientists leveraging JSTOR/DfR for textual research. Currently there is is no implementation in R to parse the meta data they deliver. Although the data quality on references is not very consistent, one could try to conduct citation analysis with a lot of control over the sample and the data compared to GoogleScholar or Web of Science.

No.

Pre-submission was approved by @karthik: https://github.com/ropensci/onboarding/issues/186

Requirements

Confirm each of the following by checking the box. This package:

Publication options


I would very much like to submit a paper to JOSS, however at the moment I don't have a proper case-study yet. If possible, I would like to submit a paper at a later stage.


Detail

The only "problems" from goodpractice::gp() are a test coverage of 98% and lengthy lines in the test-files.

If he would find the time to do it, a technical review by @jimhester could probably help in resolving some performance issues, although he might urge me to re-write the whole thing (again).

karthik commented 6 years ago

πŸ‘‹ @tklebel

Thank you for this submission. During your pre-submission review I completely missed that we had a similar submission a year ago https://github.com/ropensci/onboarding/issues/86 by @benmarwick

Since that submission was stalled, we are good to proceed with yours. I checked in with Ben on this, and his suggestion was for you (and reviewers) to look through his work to see if there are ideas that might help your effort. So please take a look at https://github.com/benmarwick/JSTORr

Editor checks:


Editor comments

You are already aware of the goodpractice results, but here they are anyway.

── GP jstor ────────────────────────────────────────────────────────────────────

It is good practice to

  βœ– write unit tests for all functions, and all package code
    in general. 97% of code lines are covered by test cases.

    R/batch_import_fun.R:21:NA
    R/batch_import_fun.R:22:NA
    R/batch_import_fun.R:23:NA
    R/example.R:15:NA
    R/example.R:16:NA
    ... and 3 more lines

  βœ– avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    tests/testthat/test-article.R:75:1
    tests/testthat/test-author-import.R:35:1
    tests/testthat/test-author-import.R:36:1
    tests/testthat/test-author-import.R:40:1
    tests/testthat/test-author-import.R:41:1
    ... and 19 more lines

  βœ– fix this R CMD check WARNING: LaTeX errors when creating
    PDF version. This typically indicates Rd problems.
  βœ– fix this R CMD check ERROR: Re-running with no
    redirection of stdout/stderr. Hmm ... looks like a package You may
    want to clean up by 'rm -rf /tmp/Rtmpq9Jt37/Rd2pdf1b207a3eca4b'
────────────────────────────────────────────────────────────────────────────────

I would very much like to submit a paper to JOSS, however at the moment I don't have a proper case-study yet. If possible, I would like to submit a paper at a later stage.

Not a problem. There is no requirement that both be timed together. JOSS submission can be made independently at anytime.


Reviewer 1: @jsonbecker Due date: Feb 23, 2018

Reviewer 2: @rileymsmith19 Due date: March 1

tklebel commented 6 years ago

Thanks @karthik!

I cannot reproduce your errors and warnings from R CMD check, neither on travis, nor on my machine, nor Appveyor or the win-builder. Could you help me in debugging, or should we ignore them?

You put my submission into topic:data-publication, whereas I initially thought that topic:data-extraction would be more suitable. Did you change your opinion?

Before my pre-submission, I look into @benmarwick s package, and read the review by @kbenoit thoroughly. I am reluctant to add similar features like which are currently implemented in JSTORr, because the same critique would apply. In general, I think it would be good to keep jstor rather simple and in a way agnostic to what the user would do with data from JSTOR (analyze n-grams, conduct topic-modeling, etc.). Nevertheless, it would probably be beneficial to show a bit more clearly what can be done with jstor. I am therefore working on a simple case study to illustrate a typical use-case, which includes filtering of meta-data in combination with the analysis of n-grams.

karthik commented 6 years ago

Thanks for catching my mistake with the wrong tag. I have fixed it.

Re. the issues from R CMD CHECK, I'll look again when I have a bit of time, but let's proceed since the reviewers will also be able to see if it's a replicable issue.

For long lines, you can add # nolint at the end of those lines https://github.com/jimhester/lintr#project-configuration this way lintr will not see them.

And lastly, I appreciate that you have looked into Ben's work and the relevant feedback on the package. I agree that it would be very helpful to include some use cases in your package.

tklebel commented 6 years ago

I added # nolint at the relevant lines, so lintr ignores them now properly.

In the meantime I wrote an initial version for a case-study which is live here: https://tklebel.github.io/jstor/articles/analysing-n-grams.html In order to host this case-study which cannot be included as a vignette, I went ahead and added pkgdown.

Finally, I somehow mistook the JOSS for the JSS. Since JOSS only requires a very short introductory paper, I included one in jstor and would like to submit to JOSS via the fast-track option, in case the review goes well.

karthik commented 6 years ago

Both reviewers are now assigned. Thanks @rileymsmith19 and @jsonbecker πŸ™!

tklebel commented 6 years ago

Since I made a few improvements over the last few days, I created a new version (0.2.6) and made a snapshot at Zenodo. I updated my initial post accordingly and will try to avoid any further changes until the reviews are in.

maelle commented 6 years ago

I think the correct Github username for @rileymsmith19 is @EccRiley 😺

EccRiley commented 6 years ago

You are correct, @maelle, Thanks! Sorry for any confusion, @karthik !

jsonbecker commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

When reading the description of find_article, find_authors, etc I don't have confidence about which should return more than one response. I think based on the pluralization, the expectation is that the XML always contains one article, which may have multiple authors, multiple references, and multiple footnotes. This is clearer in reading all of the examples, but if so, I think it'd be worth specifying whether the base XML will always have one article or more than one article up front. Additionally, I find the kable output to be appealing in the vignette and the swap from find_article and find_authors which use that output to just printing tables in the output is a little jarring for me. I'd probaby just use kable in all the examples.

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing:

2 -2.5hrs


Review Comments

tklebel commented 6 years ago

Thanks @jsonbecker for the thorough review! I'm eager to respond to the more substantial questions (csv vs sqlite and file_name vs other_identifier), but I think I will wait for the review by @EccRiley, so I can address all comments in one go.

jsonbecker commented 6 years ago

Absolutely. I plan to add a few more overall thoughts but with deadline looming I really wanted to get it in! I realize my review heavily focused on a suggestion that may have limited value and is a bit of my own preferences, but that’s largely because everything is otherwise in great shape!

http://www.jsonbecker.com/contact/

On Feb 21, 2018, at 3:53 AM, Thomas Klebel notifications@github.com wrote:

Thanks @jsonbecker for the thorough review! I'm eager to respond to the more substantial questions (csv vs sqlite and file_name vs other_identifier), but I think I will wait for the review by @EccRiley, so I can address all comments in one go.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

tklebel commented 6 years ago

I'm still eager to work on/discuss some changes, but wanted to wait for the second review. Do you happen to know until when you will be able to finish your review, @EccRiley?

sckott commented 6 years ago

@eccriley are you still able to do the review?

karthik commented 6 years ago

@tklebel Sorry for the delays. Looks like @EccRiley hasn't responded a while. I am seeking another reviewer to replace her.

tklebel commented 6 years ago

@karthik no worries. @elinw seems to have been trying out the package, maybe she would be willing to do a review.

elinw commented 6 years ago

I have been, yes. I could do a review. I also have some data if @jasonbecker would like to try it.

karthik commented 6 years ago

@elinw Fantastic! Thank you for agreeing to review. Assigning @elinw as second reviewer

karthik commented 6 years ago

@elinw Gentle ping checking in on your review. πŸ™

elinw commented 6 years ago

@karthik On it, should finish it up this weekend.

elinw commented 6 years ago

This is still a draft

Package Review

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

JSTOR is a really important data source for the study of scholarly publication on issues. JSTOR as a data source has wide coverage over many disciplines and also includes sources going back in time. As a data source JSTOR has some challenges and features that are a bit different than others. First, there is not an API in the form of a REST interface. Instead, researchers fill out a web form and request the results of particular queries. Then they are emailed a zip file containing potentially thousands (or more) of individual files including the xml data for the item and the n-gram data for that item in separate files for unigrams, bigrams and trigrams (assuming all of these were requested). These are organized into a deeply nested file structure that ends with a folder for each item (article or book). Obtaining full text data is a separate process involving a user agreement. While not fundamentally different than working with a web API that returns xml, it definitely feels different to the end user.

All of this to say that it is extremely helpful to have a package to help with managing this. Specifically the jstor package gets the data read in from all of the files into csv files separated by the type of content (articles, authors, references, footnotes). It then writes to CSV files. I think this package will encourage people to use this data source more.

The package itself is solid, test coverage is good, documentation is complete, and good practices are followed. It does exactly what it says and works as described. The code seemed fine, and in particular the ways that it it made batch size and number of cores easy to specify was helpful to me. I appreciated the single interface to all the data types.

The README is helpful in defining the structure and consistency of the data.

Comments

These are some additional comments for consideration.

I have a JSTOR basic file (not full text) from my research that I used in my testing. It contains 1678 items, 42051 references, 12376335 bigrams. I did my testing mainly on RStudioServer (open source) installed on CentOS with 4 cores, although I also did some on my macbook air. I will say that I was happy to have a package that installed easily.

At first I was surprised that the data were read to csv files. However, after some thought I feel it is very useful to have this as the main outcome as it is then up to the user to decide what kind of storage to use for further processing. I found it useful to have the stored files to return back to. For me, simply having a way to approach the mass of individual item files and get them into rectangles was a huge win, especially having the processing issues handled (meaning otherwise I would have probably just created a much less smart script for reading the files recursively).

One thing I will mention is that I did find that for my data I could not use read_csv() without generating warnings, but read.csv() worked fine. It might be helpful to include a discussion of storage options for the rectangular data. I also don't think this package requires documentation of all aspects of a text analysis What is more important is to focus on preparing the data for clean importing into other text analysis packages and the storage options they support. I say that, but I followed the instructions in the bigrams/tidy text vignette pretty exactly and found them incredibly helpful. The JSTOR data (basic files) is not really like the examples in the TidyText book or most other documentation for text analysis since the full texts are not there (and the n grams are already created). So I won't discourage doing more of this as you go forward with maintaining the package, especially for known quirks of JSTOR (see below). I also appreciated that there was a focus on getting a simple set of descriptive graphs.

The vignettes are helpful and the new one in particular is a nice overview of the process of cleaning and analyzing the ngram data. However, I think it would be stronger if it discussed some of the issues specific to the JSTOR data. For example, in working with the package I ended up creating a list of what I think of as my JSTOR stop words because academic writing is a specialized language.

For example for the references data I have:

stop_ref1 <- c("", "References", "Notes", "Note", "Ibid.", "Bibliography", 
               "Loc. cit.", "Id.", "Foonotes", "Ibid.", "Endnotes", 
               "[Bibliography]", "Notes and References", "Works Cited",
               "Footnotes","Ibid", "[References]", "Statutes Cited",
               "Literatur", "Case Cited", "Footnote", "Ibid", 
               "Cases Cited", "[Endnotes]", "References", "Reference"
               )

And for the bigrams I had this.

custom_stop_words <- c("ibid", "supra", "pp", "eg", "alia", "hoc", 
                       "e.g.")

(It is very easy to add custom stop words, I could figure out how just from following the vignette and realizing I could add my vectors to those in Tidytext). It could be useful to provide a more complete version those in some way.

I also did not necessarily want to get rid of all numbers but one kind of troublesome number was e.g. 1998b. So it would probably be easy to write a function for that, though I ended up just listing out the ones I found.

Another example, with JSTOR unless you specify otherwise, you get articles in multiple languages. The language names are not consistently coded so I used dplyr::filter(language == "en" | language == "eng"). There are various things like that which you might want to mention.

I do not have a problem with using the baseid as the main ID. This potentially lets me work with other interfaces to JSTOR, is consistently produced over time and is unique. If I have data from separate JSTOR queries it lets me find overlap easily.

Final approval (post-review)

Estimated hours spent reviewing: 5

Review Comments

elinw commented 6 years ago

Oh I meant to mention that the vignette might want to mention that installing igraph can be more complex than installing a normal package.

elinw commented 6 years ago

Okay one more thing about jstor idiosyncrasies. This is something I realized today (and relates to our prior discussion about end notes turning up in references). Reference lists/bibliographies have each source once, while endnotes may repeat the same source many times. Therefore simply running a frequency on the sources will give you distorted results (I had a surprising article turn up as most cited, but it turned out that it was one article with end notes that cited it multiple times). So I think that if you were willing, you could start a vignette that addresses some of these things and let users contribute strategies and code snippets for handling. Or you could write it yourself since you have a lot of experience, but I could see that getting a bit overwhelming unless you decide to write an article about this.

tklebel commented 6 years ago

Response to reviews

Thank you @jsonbecker and @elinw for your review and the helpful feedback! Although both of you mention, that your reviews are not finished, I will start working on your comments, so we can finish the onboarding hopefully until useR! (where I'll give a presentation about the package).

There are a couple requests for changes, and many thoughts for further consideration from you. I will outline my plan to address all issues in this post, and then work through them over the next two weeks.

From your feedback, I can distill three main areas of concern:

  1. How data are exported after they have been brought into rectangular form, and how these data can later be re-imported.
  2. The issue of using "basename_id" as the unique identifier.
  3. Known quirks which are inherent to data from JSTOR/DfR and how to deal with them.

I will discuss each of the topics, and then proceed to the remaining comments.

Data export and re-import

Depending on the size of the dataset under consideration, it might be worthwile putting the converted data into some sort of SQL-database. The package was actually designed to help with this. I separated the extraction functions in a way, so space on disk could be saved, because metadata would not have to be repeated for each reference the article has, etc. So the initial desing already planned for this structure in separate tables, which could be easily joined with dplyr. As you both agree, exporting to .csv might be a good lowest denominator. However I would probably prefer not to implement the capability to write directly to sqlite in jstor_import, since I find it difficult to get the balance right about how much control users shoud have: if there is the option to write to sqlite, should there be options for creating new tables, appending, overwriting, etc.?

That being said, I have been busy last week at implementing a new function (jst_import_zip), which lets you import files directly from the zip-archive. Users can specify which parts they want to be converted via which functions (like article = c(find_article, find_references), ngram1 = jst_read_ngram). I hope, this function will make it a lot easier to deal with the multiple different sources which are available from JSTOR, and it removes the need to extract the zip (which saves quite some disk space). The current output of this function are multiple csv-files with different file names refering to the used functions. I could very well imagine adding the option to write the output in a single sqlite-db. This would have the elegance of going from one zip-file directly to one .sqlite-file, without the hassle of any intermediate files (I don't know how I would solve the case where you have multiple zip-archives, because the data were to large for one archive (they split up special requests into files of max 1.5GB)). I would be happy about feedback on this idea! Since the reviews were initially for a version without jst_import_zip, I would prefer to keep the scope and consider jst_import_zip and related issues as an area for future improvements.


Regarding the further use of exported .csv-files, I could imagine two helper functions:

  1. One to combine multiple output files into one file. When importing metadata or references for large datasets, it makes sense to use batches, since you might not know, how big the resulting file will be. After conversion via jstor_import, you will most likely want to combine all batches into a single file, and this could be done via a helper function. As a name, I could imagine jst_combine_outputs. I would probaly opt for a "dumb" function which accepts paths as input, and simply imports, combines, and exports the content.
  2. A simple helper function to import files which were exported from jstor_import. I could try to determine the type of content from the number, or, when available, name of columns: find_article hast 17 cols, find_book has 13, find_chapters has 9, find_authors has 6, and so forth. Other than the names of columns (if they are present in the .csv) or the number of columns I don't know how I would determine, what content the file has and how to read it. All this is only to avoid using read_csv(., guess_max = 3000) or similar, since the warnings mainly come from read_csv not guessing the column_types correctly.

basename_id as unique identifier

I would agree with @elinw, that using the name of the files as unique identifier is ok ;) From my understanding, it is the only unique attribute of those files, which is the same for all articles. Articles may have up to three different types of ids besides the filename, and simply combining them might lead to unexpected issues. The filename is most likely consistent and allows to match metadata with ngrams/full-text-data. However, I like the suggestion for renaming it to file_name. This is more straightforward.

Known quirks about the data

I agree with @elinw, that there are many quirks with data from DfR, some of which we know about, some of which we do not. I like the idea of having a vignette that collects known quirks and how to deal with them. There are currently a few helpers in the source which are not exported, like find_total_pages, which simply calculates the total number of pages from first and last page. This is helpful, but may lead to errors, when the page range for an article is something like: 50-70; 310, which simply means, the body of the article is from 50-70, and there is a correction/erratum on page 310. Currently, calculating the total number of pages would result in 261 pages, since 50 is the first, and 310 the last page. There are many more quirks, and I cannot claim to have a good grasp over their majority. Especially, since I only worked with one very specific dataset, I might have found a few quirks in my data, but there might be other problems in other journals or for other applications. Therefore starting a vignette, and letting users contribute their knowledge (if they are willing to) sounds like a great idea to me.

Issues with a simple answer

Tasks to undertake

Other issues

Multiple endnotes

@elinw could you point me to the article with multiple endnots so I can take a look?

Function naming

I thought about changing names for functions across the package. After I posted the review, I had a break from the package for about 1 month. Coming back, I could not remember the names of the functions for a quick analysis :/ I'd like to at least add a prefix to all functions or change the current prefix. The tidyverse styleguide encourages the use of verbs, that's why I had find_*. Unfortunately the verb "find" might not be obvious in this context. A simple suggestion could be like the following:

I opted for "jst", since "js" might be similar to javascript, and the default for RStudio autocomplete is to display after three characters anyways.

What do you think about the proposed naming change?


Please let me know if you agree with my roadmap or in case I overlooked an issue you raised.

jsonbecker commented 6 years ago

I would be happy with the renaming to jst_*, but I might still prefer "verbing" to suggest the functionality, e.g. jst_get_article. I wouldn't hold up a πŸ‘ for that. I am satisfied given more context with the move toward file_name and using that as an identifier. I guess I can't wish JSTOR toward having a better data model ;).

I'm also good with database being out of scope.

tklebel commented 6 years ago

It took some time, but I think I am finished with the requested changes. Due to the long duration of the review process, there are many changes altogether, and only some are responses to the reviewers comments.

In my post above I outlined five areas where I would make changes:

All of them are done as of now. The news file, containing all changes, is as follows (or in a prettier version online here: https://tklebel.github.io/jstor/news/index.html):

jstor 0.3.0

Breaking changes

jst_import and jst_import_zip now use futures as a backend for parallel processing. This makes internals more compact and reduces dependencies. Furthermore this reduces the number of arguments, since the argument cores has been removed. By default, the functions run sequentially. If you want them to execute in parallel, use futures:

library(future)
plan(multiprocess)

jst_import_zip("zip-archive.zip",
               import_spec = jst_define_import(article = jst_get_article),
               out_file = "outfile")

If you want to terminate the proceses, at least on *nix-systems you need to kill them manually (once again).

Importing data directly from zip-files

There is a new set of functions which lets you directly import files from .zip-archives: jst_import_zip() and jst_define_import().

In the following example, we have a zip-archive from DfR and want to import metadata on books and articles. For all articles we want to apply jst_get_article() and jst_get_authors(), for books only jst_get_book(), and we want to read unigrams (ngram1).

First we specify what we want, and then we apply it to our zip-archive:

# specify definition
import_spec <- jst_define_import(article = c(jst_get_article, jst_get_authors),
                                 book = jst_get_book,
                                 ngram1 = jst_get_ngram)

# apply definition to archive
jst_import_zip("zip_archive.zip",
               import_spec = import_spec,
               out_file = "out_path")

If the archive contains also research reports, pamphlets or other ngrams, they will not be imported. We could however change our specification, if we wanted to import all kinds of ngrams (given that we originally requested them from DfR):

# import multiple forms of ngrams
import_spec <- jst_define_import(article = c(jst_get_article, jst_get_authors),
                                 book = jst_get_book,
                                 ngram1 = jst_get_ngram,
                                 ngram2 = jst_get_ngram,
                                 ngram3 = jst_get_ngram)

Note however that for larger archives, importing all ngrams takes a very long time. It is thus advisable to only import ngrams for articles which you want to analyse, i.e. most likely a subset of the initial request. The new function jst_subset_ngrams() helps you with this (see also the section on importing bigrams in the case study.

Before importing all files from a zip-archive, you can get a quick overview with jst_preview_zip().

New vignette

The new vignette("known-quirks") lists common problems with data from JSTOR/DfR. Contributions with further cases are welcome!

New functions

Minor changes


I don't expect you to dive into all the changes, though. I hope that all your comments are addressed with the changes and that we can finish the onboarding soonish πŸ˜„

elinw commented 6 years ago

I'm sorry I'm slow on replying but this seems good. I love the import from zip. I have some things to focus on right now but by the end of the week I'm planning to have updated to the current development branch and try things out. I will also get you an example article.

jsonbecker commented 6 years ago

I am very happy with these changes. I especially like the set and structure of the vignettes and the separation of the "case study". This now serves not only as a well documented package but as solid documentation of the data it works with. This has my full πŸ‘

elinw commented 6 years ago

25074331
basename_id journal_doi journal_jcode journal_pub_id article_doi article_pub_id article_jcode article_type 1 25074331 NA jbusiethi 25074331 research-article article_title volume issue language pub_day 1 Ethics Education in the Workplace: An Effective Tool to Combat Employee Theft 26 2 eng 1 pub_month pub_year first_page last_page 1 7 2000 89 100 Has 21646 Greengard, Samuel: April 1993, 'Theft Control\nStarts with HR Strategies', Personnel Journal, p. 85. 21647 Greengard, Samuel: April 1993, 'Theft Control\nStarts with HR Strategies', Personnel Journal, p. 85. 21648 Greengard, Samuel: April 1993, 'Theft Control\nStarts with HR Strategies', Personnel Journal, p. 85. 21649 Greengard, Samuel: April 1993, 'Theft Control\nStarts with HR Strategies', Personnel Journal, p. 85. 21650 Greengard, Samuel: April 1993, 'Theft Control\nStarts with HR Strategies', Personnel Journal, p. 85.

I actually pulled the article and it's true, they cited the same page multiple times. It was striking because it made Greengard the most cited author but I had never heard of him.

tklebel commented 6 years ago

Thanks @elinw for the example. I added a few sentences about issues with references in the vignette and linked the article as an example.

elinw commented 6 years ago

I actually don't think that the article is "wrong" it's just following the citation style of the journal it is published in. Other journals probably would have used Ibid and op. cit. or similar.

tklebel commented 6 years ago

@elinw I changed my language a bit to remove the certainty of the judgement, though I would still consider it to be an artifact since they use Ibid too, but only for some references. However there might be another reason for this which I am not aware of. I therefore rephrased like this: "[the article cites ...] which is possibly an artifact."

tklebel commented 6 years ago

@karthik just pinging if you could take a quick look at this issue. The current status seems to be:

How will this onboarding proceed?

karthik commented 6 years ago

Hi @tklebel Apologies for the delay. I'm just waiting on @elinw to sign off and with that we can finish up. Elin: Do you have any further unresolved issue with the master release of this effort? If not could you indicate sign off so we can proceed with acceptance? πŸ™

elinw commented 6 years ago

Yes, I am good with these changes, the last of which are mainly just polishing documentation etc. This is going to be really useful and encourage the use of JfR. I noticed the change from stop() to abort() and read about the differences, my suggestion is that you make that rlang::abort() explicitly, which you'll need to do for CRAN anyway.

Great job!

tklebel commented 6 years ago

@elinw I'm actually not sure if I really prefer abort over stop in general. I used it to remove the calls to .call = FALSE. But since I import it via importFrom(rlang,abort) in NAMESPACE, this should be fine for CRAN, right? If not, I will of course change it before submitting.

Apart from that, I am very happy for the thumbs up! This comes just in time before useR! πŸ˜ƒ

karthik commented 6 years ago

Congrats @tklebel , your submission has been approved! πŸŽ‰ Thank you for submitting and @elinw and @jsonbecker for thorough and timely reviews. To-dos:

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

tklebel commented 6 years ago

Great! Thank you @karthik for managing the application, and @jsonbecker and @elinw for your helpful comments!

I transferred the repo, added the rOpenSci footer and changed all relevant links in the README.

I'll see to make the package ready for CRAN and I will complete the suggestion by @elinw while doing so.

I'd very much like to write a blog post with some narrative about the development of the package. @stefaniebutland, I should have time to write this post some time after July.

tklebel commented 6 years ago

@karthik I changed more or less all links I found, build systems (travis and appveyor) are working. The only missing part as far as I can see are admin rights for the repo, so I can update a few remaining things (mainly the link to the pkgdown site). Then we could close the issue, I think.

stefaniebutland commented 6 years ago

I'd very much like to write a blog post with some narrative about the development of the package. @stefaniebutland, I should have time to write this post some time after July.

Very glad to hear that @tklebel. This link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/review/.

Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post. I'll mark my calendar to follow up with you in mid- August so we can agree on a publication date. We ask that you submit your draft post via pull request a week before the planned publication date so we can provide feedback.

Happy to answer any questions as they come up.

stefaniebutland commented 6 years ago

@tklebel I have blog post publication dates available Sept 18, Oct 2, 9, 16. If you're still interested, let me know your preferred date and mark your calendar to submit a draft a week in advance.

Please see my comment above for more details.

tklebel commented 6 years ago

@stefaniebutland I'm still interested in writing a blog post. Regarding publication dates, I would prefer Oct the 9th, if that is ok for you.

stefaniebutland commented 6 years ago

Glad to hear that @tklebel. Tuesday 2018-10-09 is yours. Please submit a draft via pull request by 2018-10-02 and don't hesitate to ask any questions here.

tklebel commented 6 years ago

@karthik is it possible, that I am still missing appropriate rights for the repo? I wanted to fix the link in the title of the repo, but I still couldn't do it (someone else fixed it for me in the meantime).

karthik commented 6 years ago

@tklebel Can you please check now?

tklebel commented 6 years ago

@karthik perfect, thanks!

stefaniebutland commented 5 years ago

Hi @tklebel. Checking to see if you're still on track to submit a draft bog post by 2018-10-02 for publication the following week.

tklebel commented 5 years ago

Hi @stefaniebutland. Is it ok if the post is more about why and how I wrote the package, including what I learned along the way, and less about how to use it? I have a lot of documentation, including a case study on the pkgdown site, so I don't think it would be helpful and interesting to write something similar again...

If you are fine with that, than I would answer, that I am on track 😺

stefaniebutland commented 5 years ago

Sounds great @tklebel. More people will likely relate to this kind of post. Please include a short section noting & linking to the docs you have.