Closed jemunro closed 1 year ago
Sounds like a plan!
I think it might be better to keep the name 'ensembl_gene_id' for compatibility with ensembl resources, and to avoid ambiguity with other ensembl identifiers (e.g. ensembl_transcript_id, ensembl_protein_id etc). In particular the biomaRt
package amongst others uses 'ensembl_gene_id' (see https://support.bioconductor.org/p/120860/)
Also happy to change the pipe, but the advantage of %>%
is that it works for R versions prior to 4.1 when |>
was introduced. I'm not sure in practice how many people with older versions of R would end up installing this package, but also don't see the harm in sticking with %>%
.
I agree with you. It's better to keep the pipe and leave the "ensembl_gene_id" untouched. Many thanks!
Great, I've added new commits to address your suggestions:
stopifnot()
In addition I have:
use_hgnc_file()
and get_hgnc_file()
, which allow a specific hgnc file/version to be selected for the entire r-sessionimport_hgnc_dataset_()
function and added warnings for column mismatchHi Ramiro, thanks for merging! Do you plan to publish this version to CRAN?
Hi @jemunro
Yes, of course!
I just haven't yet because I wanted to make some fixes before submission.
If you run a quick check you can see these problems:
W checking dependencies in R code ...
'::' or ':::' import not declared from: ‘tidyr’
✔ checking S3 generic/method consistency ...
✔ checking replacement functions ...
✔ checking foreign function calls ...
N checking R code for possible problems (1.7s)
File ‘hgnc/R/zzz.R’:
.onLoad has wrong argument list ‘pkgname, libname’
Package startup functions should have two arguments with names starting
with ‘lib’ and ‘pkg’, respectively.
See section ‘Good practice’ in '?.onAttach'.
get_hgnc_key_table: no visible binding for global variable ‘n’
hgnc_convert: no visible global function definition for ‘all_of’
Undefined global functions or variables:
all_of n
✔ checking Rd files ...
✔ checking Rd metadata ...
N checking Rd line widths ...
Rd file 'use_hgnc_file.Rd':
\examples lines wider than 100 characters:
use_hgnc_file(file = "https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/archive/monthly/tsv/hgnc_complete_set_2023-08-01.txt")
These lines will be truncated in the PDF manual.
✔ checking Rd cross-references ...
✔ checking for missing documentation entries ...
✔ checking for code/documentation mismatches (546ms)
✔ checking Rd \usage sections ...
✔ checking Rd contents ...
✔ checking for unstated dependencies in examples ...
✔ checking examples (4.1s)
✔ checking for unstated dependencies in ‘tests’ ...
─ checking tests ...
✔ Running ‘spelling.R’
X Comparing ‘spelling.Rout’ to ‘spelling.Rout.save’ ...
6,10d5
< Potential spelling errors:
< WORD FOUND IN
< eu NEWS.md:7
< maialab NEWS.md:7
< rmagno NEWS.md:7
✔ checking for non-standard things in the check directory
✔ checking for detritus in the temp directory ...
See
‘/home/rmagno/sci/code/R/pkg/hgnc.Rcheck/00check.log’
for details.
── R CMD check results ─────────────────────────────────────────── hgnc 0.1.5 ────
Duration: 14.1s
❯ checking dependencies in R code ... WARNING
'::' or ':::' import not declared from: ‘tidyr’
❯ checking R code for possible problems ... NOTE
File ‘hgnc/R/zzz.R’:
.onLoad has wrong argument list ‘pkgname, libname’
Package startup functions should have two arguments with names starting
with ‘lib’ and ‘pkg’, respectively.
See section ‘Good practice’ in '?.onAttach'.
get_hgnc_key_table: no visible binding for global variable ‘n’
hgnc_convert: no visible global function definition for ‘all_of’
Undefined global functions or variables:
all_of n
❯ checking Rd line widths ... NOTE
Rd file 'use_hgnc_file.Rd':
\examples lines wider than 100 characters:
use_hgnc_file(file = "https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/archive/monthly/tsv/hgnc_complete_set_2023-08-01.txt")
These lines will be truncated in the PDF manual.
0 errors ✔ | 1 warning ✖ | 2 notes ✖
Error: R CMD check found WARNINGs
Execution halted
Morever I would like to add a few unit tests to hgnc_convert()
, covering all mapping cases:
Sounds good.
I've tried to address some the issues in https://github.com/ramiromagno/hgnc/pull/4
I also think the one-to-one/one-to-many paradigm is a better way to think about what's going on in hgnc_convert(), so I've renamed one of the parameters to 'one_to_many' and made the behavior more consistent. (i.e. if one_to_many == TRUE, then returned value/column will be a list).
Hi Jacob,
Everything looks awesome! Thank you so much.
Now that you're at it, could you also:
ensembl_gene_id
toensembl_id
across the package?@md
tag from docs and addRoxygen: list(markdown = TRUE)
to DESCRIPTION?type
parameter inlist_archives()
doc?get_hgnc_key_table()
,hgnc_join()
andhgnc_convert
?assertthat::assert_that()
inuse_cache_dir()
andget_cache_dir()
? Then we could save on the assertthat dependency.cache_dir
parameter inuse_cache_dir()
doc.I really liked some tricks you introduced, the caching, the setting of the attribute in
latest_archive_url()
and these other functions for conversion that offer more flexibility. Thank you!I think all these changes warrant a version bump after your commits and submission to CRAN.
EDIT: And perhaps you could also get rid of
%>%
across the package and use simply|>
. We don't use any of the more fancy functionality of Hadley's pipe.