ramiromagno / hgnc

Download and import HGNC gene data into R
https://rmagno.eu/hgnc
Other
3 stars 2 forks source link

Add memory and disk based caching #3

Closed jemunro closed 1 year ago

ramiromagno commented 1 year ago

Hi Jacob,

Everything looks awesome! Thank you so much.

Now that you're at it, could you also:

I really liked some tricks you introduced, the caching, the setting of the attribute in latest_archive_url() and these other functions for conversion that offer more flexibility. Thank you!

I think all these changes warrant a version bump after your commits and submission to CRAN.

EDIT: And perhaps you could also get rid of %>% across the package and use simply |>. We don't use any of the more fancy functionality of Hadley's pipe.

jemunro commented 1 year ago

Sounds like a plan!

I think it might be better to keep the name 'ensembl_gene_id' for compatibility with ensembl resources, and to avoid ambiguity with other ensembl identifiers (e.g. ensembl_transcript_id, ensembl_protein_id etc). In particular the biomaRt package amongst others uses 'ensembl_gene_id' (see https://support.bioconductor.org/p/120860/)

Also happy to change the pipe, but the advantage of %>% is that it works for R versions prior to 4.1 when |> was introduced. I'm not sure in practice how many people with older versions of R would end up installing this package, but also don't see the harm in sticking with %>% .

ramiromagno commented 1 year ago

I agree with you. It's better to keep the pipe and leave the "ensembl_gene_id" untouched. Many thanks!

jemunro commented 1 year ago

Great, I've added new commits to address your suggestions:

In addition I have:

jemunro commented 1 year ago

Hi Ramiro, thanks for merging! Do you plan to publish this version to CRAN?

ramiromagno commented 1 year ago

Hi @jemunro

Yes, of course!

I just haven't yet because I wanted to make some fixes before submission.

ramiromagno commented 1 year ago

If you run a quick check you can see these problems:

W  checking dependencies in R code ...
   '::' or ':::' import not declared from: ‘tidyr’
✔  checking S3 generic/method consistency ...
✔  checking replacement functions ...
✔  checking foreign function calls ...
N  checking R code for possible problems (1.7s)
   File ‘hgnc/R/zzz.R’:
     .onLoad has wrong argument list ‘pkgname, libname’

   Package startup functions should have two arguments with names starting
     with ‘lib’ and ‘pkg’, respectively.
   See section ‘Good practice’ in '?.onAttach'.

   get_hgnc_key_table: no visible binding for global variable ‘n’
   hgnc_convert: no visible global function definition for ‘all_of’
   Undefined global functions or variables:
     all_of n
✔  checking Rd files ...
✔  checking Rd metadata ...
N  checking Rd line widths ...
   Rd file 'use_hgnc_file.Rd':
     \examples lines wider than 100 characters:
        use_hgnc_file(file = "https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/archive/monthly/tsv/hgnc_complete_set_2023-08-01.txt")

   These lines will be truncated in the PDF manual.
✔  checking Rd cross-references ...
✔  checking for missing documentation entries ...
✔  checking for code/documentation mismatches (546ms)
✔  checking Rd \usage sections ...
✔  checking Rd contents ...
✔  checking for unstated dependencies in examples ...
✔  checking examples (4.1s)
✔  checking for unstated dependencies in ‘tests’ ...
─  checking tests ...
✔  Running ‘spelling.R’
X  Comparing ‘spelling.Rout’ to ‘spelling.Rout.save’ ...
   6,10d5
   < Potential spelling errors:
   <   WORD      FOUND IN
   < eu        NEWS.md:7
   < maialab   NEWS.md:7
   < rmagno    NEWS.md:7
✔  checking for non-standard things in the check directory
✔  checking for detritus in the temp directory ...

   See
     ‘/home/rmagno/sci/code/R/pkg/hgnc.Rcheck/00check.log’
   for details.

── R CMD check results ─────────────────────────────────────────── hgnc 0.1.5 ────
Duration: 14.1s

❯ checking dependencies in R code ... WARNING
  '::' or ':::' import not declared from: ‘tidyr’

❯ checking R code for possible problems ... NOTE
  File ‘hgnc/R/zzz.R’:
    .onLoad has wrong argument list ‘pkgname, libname’

  Package startup functions should have two arguments with names starting
    with ‘lib’ and ‘pkg’, respectively.
  See section ‘Good practice’ in '?.onAttach'.

  get_hgnc_key_table: no visible binding for global variable ‘n’
  hgnc_convert: no visible global function definition for ‘all_of’
  Undefined global functions or variables:
    all_of n

❯ checking Rd line widths ... NOTE
  Rd file 'use_hgnc_file.Rd':
    \examples lines wider than 100 characters:
       use_hgnc_file(file = "https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/archive/monthly/tsv/hgnc_complete_set_2023-08-01.txt")

  These lines will be truncated in the PDF manual.

0 errors ✔ | 1 warning ✖ | 2 notes ✖
Error: R CMD check found WARNINGs
Execution halted
ramiromagno commented 1 year ago

Morever I would like to add a few unit tests to hgnc_convert(), covering all mapping cases:

jemunro commented 1 year ago

Sounds good.

I've tried to address some the issues in https://github.com/ramiromagno/hgnc/pull/4

I also think the one-to-one/one-to-many paradigm is a better way to think about what's going on in hgnc_convert(), so I've renamed one of the parameters to 'one_to_many' and made the behavior more consistent. (i.e. if one_to_many == TRUE, then returned value/column will be a list).