Closed salim-b closed 1 year ago
Thanks!
Going to bed now and the rest of the week is super busy, but I'll take a look as soon as I can.
Initial reaction: I use checkmate
for a different package and I like it (very thin), but I also really liked the fact that countrycode
has ZERO dependencies beyond base R. I'm not totally closed to the idea, but I'm also not 100% sure this is a merge right now.
Will resume discussion as soon as possible.
Ok, looking forward to your review.
Regarding the dependencies:
checkmate
check I've added is not really necessary for the core of this PR, it's just an additional help for the user in case an invalid custom_dict
arg is provided. But instead of removing it, I rather think it would make sense to use checkmate in way more places in this package... đrlang::arg_match()
calls could be replaced with base::match.arg()
â then we'd simply lose the informative correction suggestions in the error messages.origin
or destination
.1. I would definitely prefer to **not** go from 0 dependencies to 4. 2. I think this might conflict with custom dictionaries where we won't know the names of the columns, and hence the possible values for `origin` or `destination`.
To be fair, utils
and stats
are Base R packages. Still, I agree with the general sentiment and I woke up even more firmly in the "No Dependencies" camp. So if @salim-b doesn't mind, I'd like to revert that aspect of the PR.
And I also agree that if we are going to use some kind of match.call
, then this Issue needs to be resolved with a test as part of this PR: https://github.com/vincentarelbundock/countrycode/issues/271
In the README, there is an example with US States codes: the data is hosted on our github repo, and the example would be pretty much plug-and-play for test.
Also, if we're not going to onboard the rlang
dependency, it's not clear to me at all that there's a benefit of match.call
over our reasonably informative stop
message.
I think this might conflict with custom dictionaries where we won't know the names of the columns, and hence the possible values for origin or destination.
Nope, this also works with custom_dict
thanks to this code:
This is also the reason why I've added the checkmate::assert_data_frame()
check: If custom_dict
is set to something else than a data frame (or to one without colnames), colnames(dictionary)
evaluates to NULL
resulting in rlang::arg_match()
deriving the possible values from the caller frame â which would obviously be wrong when custom_dict
is set.
Still, I agree with the general sentiment and I woke up even more firmly in the "No Dependencies" camp. So if @salim-b doesn't mind, I'd like to revert that aspect of the PR.
Ok, so you want me to remove both checkmate
and rlang
, right?
Yeah, but just to be clear, I still want an informative error like we had before, so a match.call wonât do. We need something at least as informative as our old stop calls.
And if we remove checkmate and rlang, im not sure what the purpose of the PR really is. I havenât reviewed it properly, but I wouldnât want you to do a bunch more work if it ends up being about trailing white space.
But again, I havenât reviewed, so Iâm probably missing some other benefits...
On Wed, Mar 24, 2021, at 16:39, Salim B wrote:
I think this might conflict with custom dictionaries where we won't know the names of the columns, and hence the possible values for origin or destination.
Nope, this also works with
custom_dict
thanks to this code:This is also the reason why I've added the
checkmate::assert_data_frame()
check: Ifcustom_dict
is set to something else than a data frame (or to one without colnames),colnames(dictionary)
evaluates toNULL
resulting inrlang::arg_match()
deriving the possible values from the caller frame â which would obviously be wrong whencustom_dict
is set.Still, I agree with the general sentiment and I woke up even more firmly in the "No Dependencies" camp. So if @salim-b https://github.com/salim-b doesn't mind, I'd like to revert that aspect of the PR.
Ok, so you want me to remove both
checkmate
andrlang
, right?â You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vincentarelbundock/countrycode/pull/270#issuecomment-806171248, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHQ7MKQYQH2CLKAQ73DFSLTFJEWLANCNFSM4ZWN3CWQ.
-- Vincent Arel-Bundock
Livre en libre accĂšs: Analyse Causale et MĂ©thodes Quantitatives http://arelbundock.com/acmq.html
Professeur agrégé / Associate professor Université de Montréal, Science politique 3150 rue Jean-Brillant, Pav. Lionel-Groulx, C-4020 Montréal, Québec, Canada, H3T 1N8 http://arelbundock.com
I still want an informative error like we had before
With this PR, the error for a call like countrycode(c('USA', 'DZA'), origin = 'iso3cc', destination = 'cown')
is e.g.:
Error: `origin` must be one of "cctld", "country.name", "country.name.de", "cowc", "cown", "dhs", "ecb", "eurostat", "fao", "fips", "gaul", "genc2c", "genc3c", "genc3n", "gwc", "gwn", "imf", "ioc", "iso2c", "iso3c", "iso3n", "p4c", "p4n", "un", "un_m49", "unicode.symbol", "unpd", "vdem", "wb", "wb_api2c", "wb_api3c", "wvs", "country.name.en.regex", or "country.name.de.regex".
Did you mean "iso3c"?
Run `rlang::last_error()` to see where the error occurred.
With base::match.arg()
it would be:
Error in match.arg(origin, formals()$origin) :
'arg' should be one of âcâ, âcctldâ, âcountry.nameâ, âcountry.name.deâ, âcowcâ, âcownâ, âdhsâ, âecbâ, âeurostatâ, âfaoâ, âfipsâ, âgaulâ, âgenc2câ, âgenc3câ, âgenc3nâ, âgwcâ, âgwnâ, âimfâ, âiocâ, âiso2câ, âiso3câ, âiso3nâ, âp4câ, âp4nâ, âunâ, âun_m49â, âunicode.symbolâ, âunpdâ, âvdemâ, âwbâ, âwb_api2câ, âwb_api3câ, âwvsâ, âcountry.name.en.regexâ, âcountry.name.de.regexâ
I think the latter is clearly less elegant but would still serve the main purpose of this PR. Let me know what you think.
Got it, thanks for the clarification!
In that case, wouldn't it be better to just improve our stop
call with something like this?
if (!is.character(origin) ||
length(origin) != 1 ||
!origin %in% valid_origin) {
stop(sprintf("The `origin` argument must be a character of length 1 equal to one of these values: %s.",
paste(valid_origin, collapse = ", ")))
}
That would give us both the explicit assertion for input checking and a pretty/informative error message.
Also, FYI, all of this may change very soon because of this work in progress PR that is going to affect a lot of things:
https://github.com/vincentarelbundock/countrycode/pull/267/files
Ok, then let's wait for the merge of #267 first?
I just merged the branch, but I'm not planning to release it to CRAN any time soon. I need to play with the design of countrycode_factory
quite a bit more before I do that.
Feel free to take a look a the current state of affairs. But maybe open an issue with a proposal before diving too deep. I hate to see it when people put in efforts like checkmate
implementation and we end up turning it down. Such a waste!
Thanks again for your interest and efforts!
I had to jump through hoops to get around the merge conflicts, but in my quick test of this PR, countrycode(c('USA', 'DZA'))
returns a vector with NAs and warnings, which I think is undesirable...
> countrycode(c('USA', 'DZA'))
[1] NA NA
Warning messages:
1: In if (origin %in% c("country.name.en", "country.name.de")) { :
the condition has length > 1 and only the first element will be used
2: In if (destination == "country.name") { :
the condition has length > 1 and only the first element will be used
3: In countrycode(c("USA", "DZA")) :
Some values were not matched unambiguously: DZA, USA
current release and dev versions return an error, which I believe is the correct behavior...
> countrycode(c('USA', 'DZA'))
Error in countrycode(c("USA", "DZA")) :
argument "origin" is missing, with no default
Thanks a lot for looking into this @cjyetman . I pushed some changes last night to master that I think make much of this PR obsolete, because it reinforces argument checking and gives an informative message for valid origins.
Note that the current main branch also includes a new countrycode_factory
function with which I have not played much, so I don't intend to publish to CRAN until I (and others) have had some time to test things out and improve on the current bare-bones functionality.
This gives way better error messages, cf.
rlang::arg_match()
.Plus some cosmetic changes.