inbo / n2khab

R package with preprocessing functions and standard reference data for Flemish Natura 2000 (N2K) habitat (HAB) analyses
https://inbo.github.io/n2khab
GNU General Public License v3.0
2 stars 1 forks source link

Update read_watersurfaces for use with v 1.1 #118

Closed cecileherr closed 3 years ago

cecileherr commented 3 years ago

Changes:

Note about the different versions:

There are several aspects that could be better managed (e.g. possible conflicts between file and version arguments) but I think it might be more interesting to work directly towards a more structured implementation of version control (with md5, ...). Any thoughts on this?

florisvdh commented 3 years ago

I think it might be more interesting to work directly towards a more structured implementation of version control (with md5, ...).

Agreed @cecileherr. If I'm to implement that (current plan) I need to postpone that till later this year and in the meantime quick-fixes are needed. The new function behaviour as you describe in your comment sounds very good (will review soon). But yes, we definitely need an overarching system which can tackle all this version/file checking (I have an idea), especially for extra versions in the future. Also involves #113.

If you're up to do the coding yourself sooner: welcome - then we best discuss on the ideas in person first. Still, we will always need version-specific code inside the functions (file contents differ per version), at least for all versions which we continue supporting (which I'd like to keep up at least as long as the approach to handle versions is evolving).

florisvdh commented 3 years ago

What do you think about this alternative:

  • setting the gpkg path as default in function(...) (hence, show in Usage)
  • then setting the v1.0 folder only if (version == "watersurfaces_v1.0" && missing(file))
  • relating to "the user can specify version 1.0 as argument without needing to change the file argument", add this in the Details.

@cecileherr I think the following will be better and more future-proof:

What do you think?

florisvdh commented 3 years ago

@cecileherr This idea comes a bit late to the party, but for the é and ï cases, can you check whether the following trick, building on the results of inbo/n2khab-preprocessing#53, works as well? It would make translating the encoding from UTF-8 to latin1 (and actively defining as latin1) unnecessary, which may not be an ideal solution IMO. If the following works, it would be a better cross-platform solution.

Encoding(x) # I think this returns "unknown"
Encoding(x) <- "UTF-8"
print(x) # to check the effect: are characters printed correctly?

If I'm correct you will see that, originally, R does not know about the encoding of the data (Encoding(x) returns "unknown"). Hence it assumes the native encoding of the operating system. Which, in the case of GeoPackages (= UTF-8 or UTF-16), may always be wrong in Windows (unless GDAL attempts to take care of it, don't know for sure - and maybe sf plays a role in it).

The above sets the correct encoding for the characters at hand, and maybe that already suffices. But I'm not sure; I think I've tried it before, maybe not (always?) successfully.

UPDATE: If the above trick doesn't work, I'd advise to apply your iconv(from = "UTF-8", to = "UTF-8") approach. It seems better to keep data which are already in UTF-8, as UTF-8. If that needs translating from and to the same, so be it.

florisvdh commented 3 years ago

<< For the character encoding, some ideas along the same line can be found in r-spatial/sf#5. A good workflow might also be achieved by using rgdal to set encoding = "UTF-8" and then converting the object to sf:

If you'd follow such route, then don't Import rgdal but add it to Suggests (file DESCRIPTION), and look at how suggested packages are used in n2khab functions. However this route will slow down the usage of the function. >>

EDIT: from the documentation of the named encoding argument, it seems restricted to shapefiles only. Indeed it seems GDAL does not provide a multi-encoding option for most drivers, and otherwise you could pass it with sf::read_sf(options=). So I think the rgdal idea will not help.

a b
encoding default NULL, if set to a character string, and the driver is “ESRI Shapefile”, and use_iconv is FALSE, it is passed to the CPL Option “SHAPE_ENCODING” immediately before reading the DBF of a shapefile. If use_iconv is TRUE, and encoding is not NULL, it will be used to convert input strings from the given value to the native encoding for the system/platform.
use_iconv default FALSE; if TRUE and encoding is not NULL, it will be used to convert input strings from the given value to the native encoding for the system/platform.
cecileherr commented 3 years ago

About the encoding problems:

Encoding(x) <- "UTF-8"

Implemented in 04e0d83 for the non geometrical layers (for an unknown reason the characters were correctly recognized in the geometrical layer of the Geopackage). The encoding for the non geometrical layers was indeed unknown. If I set it with Encoding() <-, it becomes UTF-8 but only for the elements with special characters, the encoding of the other elements stays "unknown".

 connectivitytransl <- read_sf(file, layer = "LktCONNECT") %>%
                    # mutate(across(where(is.character),
                    #               .fns = function(x){return(`Encoding<-`(x, "UTF-8"))})) %>%
                    mutate_if(., is.character,
                              .funs = function(x){return(`Encoding<-`(x, "UTF-8"))}) 

Encoding(connectivitytransl$Code)
#[1] "UTF-8"   "unknown" "unknown"
Encoding(connectivitytransl$Omschr)
#[1] "unknown" "UTF-8"   "UTF-8" 

I guess it should be ok.

Side note about the commented part: I know mutate_if is deprecated, but I decided against using the across alternative because of this: (where not exported) https://github.com/r-lib/tidyselect/issues/201

cecileherr commented 3 years ago

What do you think about this alternative:

  • setting the gpkg path as default in function(...) (hence, show in Usage)
  • then setting the v1.0 folder only if (version == "watersurfaces_v1.0" && missing(file))
  • relating to "the user can specify version 1.0 as argument without needing to change the file argument", add this in the Details.

@cecileherr I think the following will be better and more future-proof:

  • no explicit default for file while still showing in Usage that it is optional, with function(file = NULL, ). Cf. https://adv-r.hadley.nz/functions.html#missing-arguments

    • This may be a more robust way for the future (less maintenance needed), considering that we'll use metadata tables already inside the package that will tell where files are by default (for each data source, per version). (And which can be returned as well)
  • if (missing(file)), i.e. user did not actively specify file, set the file for each version
  • change the file argument documentation to state more clearly that function will use defaults based on version. I.e. override the default with an explicit @param file ...

What do you think?

See 6e12e73 (and documentation in d26d5b0 and 4491ef7)

cecileherr commented 3 years ago

Many thanks for the detailed review (as usual ;-) ) and tips!

Pushed a few commits to cover your suggestions:

About .data$

Best check this note disappears locally by running the check in the RStudio Build pane (just pressing ctrl+shift+E should do the job).

Unfortunately it does not work on my computer yet (seems I have to install extra software and I need admin rights, so no immediate fix for this. I hope you don't mind if I keep polluting the online repository with my failing checks while I try to solve the issue locally)

About NAMESPACE

I am a bit perplexed about this one. As far as I can remember I only ran roxygen2::roxygenise() locally, I don't think I changed the file directly (or it was not on purpose). Anyway probably better practice to run devtools::document()

About whether to include full names for usage and depth class?

cecileherr commented 3 years ago

I follow https://r-pkgs.org/man.html#man-workflow and https://r-pkgs.org/namespace.html#imports, but clearly the latter advice has evolved since I learned it - rather the default advice is now to always do package::function() instead of only doing that for packages in Suggests:. Hm, that's not as handy for the developer IMHO.

Ok, just let me known if we should start using package::function() systematically from then on! (on the other hand the article states that it is a bit slower to use package::function() than @importFrom pkg fun +function() )

About the local CMD check: for now I need to install qpdf and change the system path in Win.

If you like to experiment more or feel limited to do that in the PR-context, you can always do it in a separate branch (outside PR): on pushing, it also triggers the R-CMD-check workflow, on the branch tip. Outside a PR, it's also more convenient to update/overwrite your git history (invoking new commit hashes) and force-push your branch untill all is fine. After that you can merge your branch locally (with main feature branch checked out: git merge experimentalbranch), push it, and delete experimentalbranch on the remote (git push --delete origin experimentalbranch). I could demo that if you like.

Yes, might be the way to go. Maybe next time, when I get many many reactions to a PR ;-)

As far as I can see, the failed check for macOS is not related to n2khab (?) so I suppose I can merge this PR.

florisvdh commented 3 years ago

the failed check for macOS is not related to n2khab (?) so I suppose I can merge this PR.

Yes, see #120. I'll merge your PR into dev-0.5.0!

florisvdh commented 3 years ago

let me known if we should start using package::function() systematically

I'd rather not change current approaches. @importFrom is not going to disappear; it's a matter of philosophy. I prefer one coding strategy within the same package - sometimes it can mean to migrate everything.

florisvdh commented 3 years ago

@hansvancalster @ElsLommelen does this sound familiar; any recommendations about how @cecileherr can solve below problem with locally running R CMD check? Double quoted is from me, single quoted from her.

Best check this note disappears locally by running the check in the RStudio Build pane (just pressing ctrl+shift+E should do the job).

Unfortunately it does not work on my computer yet (seems I have to install extra software and I need admin rights, so no immediate fix for this.

I believe all Windows users should have RTools installed by default (if not: better let it happen), I don't know about other needs?

About the local CMD check: for now I need to install qpdf and change the system path in Win.

hansvancalster commented 3 years ago

Unfortunately it does not work

@cecileherr can you show us the actual error? Does it say which extra software is needed?

cecileherr commented 3 years ago

Unfortunately it does not work

@cecileherr can you show us the actual error? Does it say which extra software is needed?

qpdf is missing (the error message was pretty clear, I can add it here later if it can be useful), see also https://stackoverflow.com/questions/15035150/r-cmd-check-as-cran-warning It is probably not hard to solve, but I need IT to modify the Win system path (admin rights required)

hansvancalster commented 3 years ago

@cecileherr yes indeed, you need IT to solve that. We should probably also document this in admin installation instructions (it affects all Windows 10 users who want to use devtools::check()).