thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
572 stars 64 forks source link

Make gggenomes ready for CRAN submission #168

Closed iimog closed 4 months ago

iimog commented 9 months ago

I'm trying to fix all the Errors, Warnings and Notes of R CMD check. I already fixed a couple. Still getting errors in the examples. Status is:

Status: 1 ERROR, 7 WARNINGs, 4 NOTEs

The 1 Error is misleading, because only fixing one turns up the next one :sweat_smile:

I'll continue fixing stuff and add it to this pr. But all changes I made so far should be safe to merge anytime, to avoid conflicts in the future.

thackl commented 9 months ago

Yeah, about what I expected. I already did some namespace stuff yesterday, but didn't follow through with all imported packages, in particular those that I use a lot. But if you want to find all functions from other packages, the code snippet from here works quite well: https://stackoverflow.com/questions/67492020/finding-objects-from-other-packages-namespaces-in-package-code

iimog commented 9 months ago

That looks really useful, indeed. Going 1-by-1 would have been really tedious.

iimog commented 9 months ago

Update: all ERRORs are fixed. Now we have:

Status: 7 WARNINGs, 8 NOTEs

Again, I think the counts are underestimations :-D I will work on them, next.

thackl commented 9 months ago

I'm guessing CRAN is anal about this and requires "WARNINGs/NOTEs" to be fixed?

iimog commented 9 months ago

Yes, I think so, too.

iimog commented 9 months ago

6 Notes are fixed. The warnings need a bit more attention. Just to be sure, to fix one of the notes I removed ggplot2 from Imports as we already have it in Depends. And I assume, that we want to keep ggplot2 in Depends, right?

thackl commented 9 months ago

Yes, I think we should have ggplot2 in depends so people can always directly use any ggplot2 functions in the plot without having to load the library or use explicit ggplot2:: calls. Not even sure + stuff would work properly if we'd not depend on/library ggplot2 ;)

iimog commented 7 months ago

I'll continue to fix warnings and notes, but there is one thing, that can not easily be fixed. The CRAN repository policy states, that:

Packages should be of the minimum necessary size. Reasonable compression should be used for data (not just .rda files) and PDF documentation: CRAN will if necessary pass the latter through qpdf. As a general rule, neither data nor documentation should exceed 5MB (which covers several books). A CRAN package is not an appropriate way to distribute course notes, and authors will be asked to trim their documentation to a maximum of 5MB. Where a large amount of data is required (even after compression), consideration should be given to a separate data-only package which can be updated only rarely (since older versions of packages are archived in perpetuity). [...]

Our check indicates, that:

* checking installed package size ... NOTE
  installed size is 100.4Mb
  sub-directories of 1Mb or more:
    doc       4.4Mb
    extdata  95.7Mb

So documentation is fine, but already close to the limit and extdata exceeds the limit. In order to fix this, we can consider a data-only package as suggested by the CRAN policy.

iimog commented 7 months ago

New status is:

Status: 4 WARNINGs, 3 NOTEs

Another NOTE that might be hard to get rid of is:

* checking dependencies in R code ... NOTE
Unexported objects imported by ':::' calls:
  ‘ellipsis:::dots’ ‘ggplot2:::make_labels’ ‘ggplot2:::scales_list’
  ‘scales:::force_all’
  See the note in ?`:::` about the use of this operator.

Accessing internal functions is not recommended, the referenced note states:

It is typically a design mistake to use ‘:::’ in your code since the corresponding object has probably been kept internal for a good reason. Consider contacting the package ‘maintainer’ if you feel the need to access the object for anything but mere inspection.

We have to check, whether our usage of these four internal functions is really necessary, or if we can use some exported functions instead.

iimog commented 7 months ago

New status:

Status: 2 WARNINGs, 3 NOTEs
iimog commented 6 months ago

New status:

Status: 1 WARNING, 3 NOTEs
iimog commented 6 months ago

Finally, we are at

Status: 3 NOTEs

So no more WARNINGs or ERRORs. However, these 3 notes might still be relevant. And one of the notes are actually 325 separate instances of the message no visible binding for global variable.

I'll tackle that next.

iimog commented 6 months ago

I fixed 70 of the 325 "no visible binding for global variable" notes. The fix is not hard, just laborious. It basically consists of adding the .data pronoun as suggested in this article. This is not hard work but it takes a lot of time, because there are so many instances. Before I continue to do this, I wanted to make sure, that we really want this. I think it is required to get the package into CRAN. But I don't think it improves the readability of the code (see the diff of my last commit). So @thackl are you happy with having the .data pronoun added everywhere?

iimog commented 5 months ago

I finally fixed all but one of the "no visible binding for global variable" notes. It was painful, I repeatedly broke stuff and readability of the code is not improved in my opinion. Anyway, it seems to be necessary for CRAN :grimacing: The remaining note about global variables is probably legit. It is about the function file_suffix here: https://github.com/thackl/gggenomes/blob/0f43a2fce25d6debeb7ae00a7e43b943859488f0/R/read_gbk.R#L15 and again in line 23 of the same file. I was unable to find this function. @thackl can you tell me from which package you are using this function? We probably need to add that as a dependency.

iimog commented 5 months ago

When running R CMD check --as-cran I get additional NOTEs. I'll try to fix these as well. As cran our current status is:

Status: 8 NOTEs
iimog commented 5 months ago

I'm finally at a point where almost everything is fixed :sweat_smile:

For the remaining issues, I need some input from @thackl

Full log
* using log directory ‘/home/markus/projects/gggenomes/gggenomes.Rcheck’
* using R version 4.3.2 (2023-10-31)
* using platform: x86_64-pc-linux-gnu (64-bit)
* R was compiled by
    gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
    GNU Fortran (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
* running under: Ubuntu 22.04.3 LTS
* using session charset: UTF-8
* using option ‘--as-cran’
* checking for file ‘gggenomes/DESCRIPTION’ ... OK
* this is package ‘gggenomes’ version ‘0.9.12.9000’
* package encoding: UTF-8
* checking CRAN incoming feasibility ... [6s/17s] NOTE
Maintainer: ‘Thomas Hackl ’

New submission

Version contains large components (0.9.12.9000)

Size of tarball: 48345619 bytes
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking serialization versions ... OK
* checking whether package ‘gggenomes’ can be installed ... OK
* checking installed package size ... NOTE
  installed size is 100.3Mb
  sub-directories of 1Mb or more:
    doc       4.4Mb
    extdata  95.7Mb
* checking package directory ... OK
* checking for future file timestamps ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... NOTE
Unexported objects imported by ':::' calls:
  ‘ellipsis:::dots’ ‘ggplot2:::make_labels’ ‘ggplot2:::scales_list’
  ‘scales:::force_all’
  See the note in ?`:::` about the use of this operator.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... NOTE
read_gbk: no visible global function definition for ‘file_suffix’
Undefined global functions or variables:
  file_suffix
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking LazyData ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking sizes of PDF files under ‘inst/doc’ ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... [63s/61s] NOTE
Examples with CPU (user + system) or elapsed time > 5s
      user system elapsed
flip 5.653  0.009   5.662
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ... OK
  Running ‘testthat.R’
* checking for unstated dependencies in vignettes ...
thackl commented 4 months ago

Wow, awesome job!!! Sorry, it took me soo long to pick it up again.

Re file_suffix https://github.com/thackl/gggenomes/pull/168#issuecomment-1900172259 - I have now idea. I think this was actually at some point a function I added myself, but later deleted without noticing. I think we don't have a test that actually reads from gzipped genbank...

thackl commented 4 months ago

I think I got most of it. Do you know which examples are too slow. Both splitting or skipping are fine with me.

iimog commented 4 months ago

This is awesome :rocket: After a couple of package updates, the check went through with a single remaining NOTE (so the slow examples seem to be fast enough now :smile:)

* checking installed package size ... NOTE
  installed size is  6.1Mb
  sub-directories of 1Mb or more:
    doc   4.4Mb

So I shrank the images in the emales vignette from 5400x2400 to 1620x720 which is still much larger than the display size in the html.

Now all the checks are passing locally.

``` * using log directory ‘/home/markus/projects/gggenomes/gggenomes.Rcheck’ * using R version 4.3.2 (2023-10-31) * using platform: x86_64-pc-linux-gnu (64-bit) * R was compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 GNU Fortran (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 * running under: Ubuntu 22.04.4 LTS * using session charset: UTF-8 * checking for file ‘gggenomes/DESCRIPTION’ ... OK * this is package ‘gggenomes’ version ‘1.0.0’ * package encoding: UTF-8 * checking package namespace information ... OK * checking package dependencies ... OK * checking if this is a source package ... OK * checking if there is a namespace ... OK * checking for executable files ... OK * checking for hidden files and directories ... OK * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking whether package ‘gggenomes’ can be installed ... OK * checking installed package size ... OK * checking package directory ... OK * checking ‘build’ directory ... OK * checking DESCRIPTION meta-information ... OK * checking top-level files ... OK * checking for left-over files ... OK * checking index information ... OK * checking package subdirectories ... OK * checking R files for non-ASCII characters ... OK * checking R files for syntax errors ... OK * checking whether the package can be loaded ... OK * checking whether the package can be loaded with stated dependencies ... OK * checking whether the package can be unloaded cleanly ... OK * checking whether the namespace can be loaded with stated dependencies ... OK * checking whether the namespace can be unloaded cleanly ... OK * checking loading without being on the library search path ... OK * checking dependencies in R code ... OK * checking S3 generic/method consistency ... OK * checking replacement functions ... OK * checking foreign function calls ... OK * checking R code for possible problems ... OK * checking Rd files ... OK * checking Rd metadata ... OK * checking Rd cross-references ... OK * checking for missing documentation entries ... OK * checking for code/documentation mismatches ... OK * checking Rd \usage sections ... OK * checking Rd contents ... OK * checking for unstated dependencies in examples ... OK * checking contents of ‘data’ directory ... OK * checking data for non-ASCII characters ... OK * checking LazyData ... OK * checking data for ASCII and uncompressed saves ... OK * checking sizes of PDF files under ‘inst/doc’ ... OK * checking installed files from ‘inst/doc’ ... OK * checking files in ‘vignettes’ ... OK * checking examples ... OK * checking for unstated dependencies in ‘tests’ ... OK * checking tests ... OK Running ‘testthat.R’ * checking for unstated dependencies in vignettes ... OK * checking package vignettes in ‘inst/doc’ ... OK * checking running R code from vignettes ... NONE ‘emales.Rmd’ using ‘UTF-8’... OK ‘gggenomes.Rmd’ using ‘UTF-8’... OK * checking re-building of vignette outputs ... OK * checking PDF version of manual ... OK * DONE Status: OK ```

So I think we are ready for CRAN submission :sunglasses: