Closed cecileherr closed 3 years ago
I think it might be more interesting to work directly towards a more structured implementation of version control (with md5, ...).
Agreed @cecileherr. If I'm to implement that (current plan) I need to postpone that till later this year and in the meantime quick-fixes are needed. The new function behaviour as you describe in your comment sounds very good (will review soon). But yes, we definitely need an overarching system which can tackle all this version/file checking (I have an idea), especially for extra versions in the future. Also involves #113.
If you're up to do the coding yourself sooner: welcome - then we best discuss on the ideas in person first. Still, we will always need version-specific code inside the functions (file contents differ per version), at least for all versions which we continue supporting (which I'd like to keep up at least as long as the approach to handle versions is evolving).
What do you think about this alternative:
- setting the gpkg path as default in
function(...)
(hence, show in Usage)- then setting the v1.0 folder only
if (version == "watersurfaces_v1.0" && missing(file))
- relating to "the user can specify version 1.0 as argument without needing to change the file argument", add this in the Details.
@cecileherr I think the following will be better and more future-proof:
file
while still showing in Usage that it is optional, with function(file = NULL, )
. Cf. https://adv-r.hadley.nz/functions.html#missing-arguments
if (missing(file))
, i.e. user did not actively specify file
, set the file for each versionfile
argument documentation to state more clearly that function will use defaults based on version
. I.e. override the default with an explicit @param file ...
What do you think?
@cecileherr This idea comes a bit late to the party, but for the é
and ï
cases, can you check whether the following trick, building on the results of inbo/n2khab-preprocessing#53, works as well? It would make translating the encoding from UTF-8 to latin1 (and actively defining as latin1) unnecessary, which may not be an ideal solution IMO. If the following works, it would be a better cross-platform solution.
Encoding(x) # I think this returns "unknown"
Encoding(x) <- "UTF-8"
print(x) # to check the effect: are characters printed correctly?
If I'm correct you will see that, originally, R does not know about the encoding of the data (Encoding(x)
returns "unknown"
). Hence it assumes the native encoding of the operating system. Which, in the case of GeoPackages (= UTF-8 or UTF-16), may always be wrong in Windows (unless GDAL attempts to take care of it, don't know for sure - and maybe sf
plays a role in it).
The above sets the correct encoding for the characters at hand, and maybe that already suffices. But I'm not sure; I think I've tried it before, maybe not (always?) successfully.
UPDATE: If the above trick doesn't work, I'd advise to apply your iconv(from = "UTF-8", to = "UTF-8")
approach. It seems better to keep data which are already in UTF-8, as UTF-8. If that needs translating from and to the same, so be it.
<< For the character encoding, some ideas along the same line can be found in r-spatial/sf#5. A good workflow might also be achieved by using rgdal
to set encoding = "UTF-8"
and then converting the object to sf
:
Note that
rgdal::readOGR()
hasencoding=
anduse_iconv=
arguments
you could either use
rgdal::readOGR()
as suggested by rsbivand and convert that to an sf object withst_as_sf()
.
If you'd follow such route, then don't Import rgdal
but add it to Suggests (file DESCRIPTION
), and look at how suggested packages are used in n2khab
functions. However this route will slow down the usage of the function. >>
EDIT: from the documentation of the named encoding
argument, it seems restricted to shapefiles only. Indeed it seems GDAL does not provide a multi-encoding option for most drivers, and otherwise you could pass it with sf::read_sf(options=)
. So I think the rgdal
idea will not help.
a | b |
---|---|
encoding | default NULL, if set to a character string, and the driver is “ESRI Shapefile”, and use_iconv is FALSE, it is passed to the CPL Option “SHAPE_ENCODING” immediately before reading the DBF of a shapefile. If use_iconv is TRUE, and encoding is not NULL, it will be used to convert input strings from the given value to the native encoding for the system/platform. |
use_iconv | default FALSE; if TRUE and encoding is not NULL, it will be used to convert input strings from the given value to the native encoding for the system/platform. |
About the encoding problems:
Encoding(x) <- "UTF-8"
Implemented in 04e0d83 for the non geometrical layers (for an unknown reason the characters were correctly recognized in the geometrical layer of the Geopackage).
The encoding for the non geometrical layers was indeed unknown. If I set it with Encoding() <-
, it becomes UTF-8 but only for the elements with special characters, the encoding of the other elements stays "unknown".
connectivitytransl <- read_sf(file, layer = "LktCONNECT") %>%
# mutate(across(where(is.character),
# .fns = function(x){return(`Encoding<-`(x, "UTF-8"))})) %>%
mutate_if(., is.character,
.funs = function(x){return(`Encoding<-`(x, "UTF-8"))})
Encoding(connectivitytransl$Code)
#[1] "UTF-8" "unknown" "unknown"
Encoding(connectivitytransl$Omschr)
#[1] "unknown" "UTF-8" "UTF-8"
I guess it should be ok.
Side note about the commented part: I know mutate_if
is deprecated, but I decided against using the across
alternative because of this: (where
not exported) https://github.com/r-lib/tidyselect/issues/201
What do you think about this alternative:
- setting the gpkg path as default in
function(...)
(hence, show in Usage)- then setting the v1.0 folder only
if (version == "watersurfaces_v1.0" && missing(file))
- relating to "the user can specify version 1.0 as argument without needing to change the file argument", add this in the Details.
@cecileherr I think the following will be better and more future-proof:
no explicit default for
file
while still showing in Usage that it is optional, withfunction(file = NULL, )
. Cf. https://adv-r.hadley.nz/functions.html#missing-arguments
- This may be a more robust way for the future (less maintenance needed), considering that we'll use metadata tables already inside the package that will tell where files are by default (for each data source, per version). (And which can be returned as well)
if (missing(file))
, i.e. user did not actively specifyfile
, set the file for each version- change the
file
argument documentation to state more clearly that function will use defaults based onversion
. I.e. override the default with an explicit@param file ...
What do you think?
See 6e12e73 (and documentation in d26d5b0 and 4491ef7)
file = NULL
)ifs
to retrieve an error when the file
and the version
are conflictingMany thanks for the detailed review (as usual ;-) ) and tips!
Pushed a few commits to cover your suggestions:
.data$
4e60448is.null(file)
to missing(file)
and adapted the documentation accordingly ba8c91f & dbbc07astr_replace
in the @importFrom
(I thought I had checked this a 100 times :D) 4e60448About .data$
Best check this note disappears locally by running the check in the RStudio Build pane (just pressing ctrl+shift+E should do the job).
Unfortunately it does not work on my computer yet (seems I have to install extra software and I need admin rights, so no immediate fix for this. I hope you don't mind if I keep polluting the online repository with my failing checks while I try to solve the issue locally)
About NAMESPACE
I am a bit perplexed about this one. As far as I can remember I only ran roxygen2::roxygenise()
locally, I don't think I changed the file directly (or it was not on purpose). Anyway probably better practice to run devtools::document()
About whether to include full names for usage and depth class?
I follow https://r-pkgs.org/man.html#man-workflow and https://r-pkgs.org/namespace.html#imports, but clearly the latter advice has evolved since I learned it - rather the default advice is now to always do package::function() instead of only doing that for packages in Suggests:. Hm, that's not as handy for the developer IMHO.
Ok, just let me known if we should start using package::function() systematically from then on! (on the other hand the article states that it is a bit slower to use package::function() than @importFrom pkg fun +function() )
About the local CMD check: for now I need to install qpdf and change the system path in Win.
If you like to experiment more or feel limited to do that in the PR-context, you can always do it in a separate branch (outside PR): on pushing, it also triggers the R-CMD-check workflow, on the branch tip. Outside a PR, it's also more convenient to update/overwrite your git history (invoking new commit hashes) and force-push your branch untill all is fine. After that you can merge your branch locally (with main feature branch checked out: git merge experimentalbranch), push it, and delete experimentalbranch on the remote (git push --delete origin experimentalbranch). I could demo that if you like.
Yes, might be the way to go. Maybe next time, when I get many many reactions to a PR ;-)
As far as I can see, the failed check for macOS is not related to n2khab (?) so I suppose I can merge this PR.
the failed check for macOS is not related to n2khab (?) so I suppose I can merge this PR.
Yes, see #120. I'll merge your PR into dev-0.5.0
!
let me known if we should start using package::function() systematically
I'd rather not change current approaches. @importFrom
is not going to disappear; it's a matter of philosophy. I prefer one coding strategy within the same package - sometimes it can mean to migrate everything.
@hansvancalster @ElsLommelen does this sound familiar; any recommendations about how @cecileherr can solve below problem with locally running R CMD check
? Double quoted is from me, single quoted from her.
Best check this note disappears locally by running the check in the RStudio Build pane (just pressing ctrl+shift+E should do the job).
Unfortunately it does not work on my computer yet (seems I have to install extra software and I need admin rights, so no immediate fix for this.
I believe all Windows users should have RTools installed by default (if not: better let it happen), I don't know about other needs?
About the local CMD check: for now I need to install qpdf and change the system path in Win.
Unfortunately it does not work
@cecileherr can you show us the actual error? Does it say which extra software is needed?
Unfortunately it does not work
@cecileherr can you show us the actual error? Does it say which extra software is needed?
qpdf is missing (the error message was pretty clear, I can add it here later if it can be useful), see also https://stackoverflow.com/questions/15035150/r-cmd-check-as-cran-warning It is probably not hard to solve, but I need IT to modify the Win system path (admin rights required)
@cecileherr yes indeed, you need IT to solve that. We should probably also document this in admin installation instructions (it affects all Windows 10 users who want to use devtools::check()
).
Changes:
connectivity_name
with a description of the connectivity to streamsNote about the different versions:
mutate
statements in anif-else statement
- to make it easy to modify the code in the future-, but actually thesemutate
for v1.0 would be perfectly compatible with v1.1version
argument to be able to read version 1.0 (if not: generates a meaningful error)file
argument), the file will be treated as version 1.1 by default without any warning (unless the user specifies theversion
argument)There are several aspects that could be better managed (e.g. possible conflicts between
file
andversion
arguments) but I think it might be more interesting to work directly towards a more structured implementation of version control (with md5, ...). Any thoughts on this?