kadyb / rgugik

Download datasets from Polish Head Office of Geodesy and Cartography
https://kadyb.github.io/rgugik/
Other
33 stars 4 forks source link

geocodePL_get returns `sf data.frame` #43

Closed BERENZ closed 4 years ago

BERENZ commented 4 years ago

Proposal for a changed output of geocodePL_get. Now it returns object of a sf data.frame class instead of a list.

kadyb commented 4 years ago
output = jsonlite::fromJSON(prepared_URL)[["results"]]
if (!is.null(output)) {

output can be NULL?

BERENZ commented 4 years ago

Yes, it can be. For example when address was not found

> address = "Poznań, Aleja Niepodległości"
> base_URL = "https://services.gugik.gov.pl/uug/?request=GetAddress&address="
> prepared_URL = paste0(base_URL, address)
> prepared_URL = gsub(" ", "%20", prepared_URL)
> output = jsonlite::fromJSON(prepared_URL)[["results"]]
> output
NULL
kadyb commented 4 years ago

Good catch!

BERENZ commented 4 years ago

I noticed that it is very sensitive to the address specification. See for instance the following examples:

> address = "Poznań, Ząbkowicka 12H"
> base_URL = "https://services.gugik.gov.pl/uug/?request=GetAddress&address="
> prepared_URL = paste0(base_URL, address)
> prepared_URL = gsub(" ", "%20", prepared_URL)
> output = jsonlite::fromJSON(prepared_URL)[["results"]]
> str(output,1)
List of 1
 $ 1:List of 16

> address = "Poznań, Zabkowicka 12H"
> base_URL = "https://services.gugik.gov.pl/uug/?request=GetAddress&address="
> prepared_URL = paste0(base_URL, address)
> prepared_URL = gsub(" ", "%20", prepared_URL)
> output = jsonlite::fromJSON(prepared_URL)[["results"]]
> output
> str(output,1)
List of 1
 $ 1:List of 16

> address = "Poznań, Ząbkowicka 12 H"
> base_URL = "https://services.gugik.gov.pl/uug/?request=GetAddress&address="
> prepared_URL = paste0(base_URL, address)
> prepared_URL = gsub(" ", "%20", prepared_URL)
> output = jsonlite::fromJSON(prepared_URL)[["results"]]
> output
NULL

> address = "Poznań Ząbkowicka 12H"
> base_URL = "https://services.gugik.gov.pl/uug/?request=GetAddress&address="
> prepared_URL = paste0(base_URL, address)
> prepared_URL = gsub(" ", "%20", prepared_URL)
> output = jsonlite::fromJSON(prepared_URL)[["results"]]
> output
NULL
kadyb commented 4 years ago

Suppose

output = jsonlite::fromJSON(prepared_URL)[["results"]]

returns NULL. In that case, stop("all inputs are empty") will be executed?

BERENZ commented 4 years ago

Yes, see below

> geocodePL_get(address = "Poznań Ząbkowicka 12H")
Error in geocodePL_get(address = "Poznań Ząbkowicka 12H") : 
  all inputs are empty
kadyb commented 4 years ago

So that's not right. This error should appear after the user doesn't enter the input. If jsonlite::fromJSON returns NULL, then NULL should also be returned at the end.

BERENZ commented 4 years ago

Ok, so I should include the stop function somewhere inside, like here:

if (!nchar(rail_crossing) == 11) {
      stop("rail crossing ID must be 11 characters long")
 }

or maybe rearrange the whole function so the df_output will be created at the very end? This may help to avoid to many stops inside.

kadyb commented 4 years ago

Maybe just before stop("all inputs are empty") use:

if (is.null(output)) {
  return(NULL)
}
kadyb commented 4 years ago

Oh, sorry. I thought after you commit. It is probably better for the user to be informed that the name is possibly wrong than NULL. So return(NULL) change to return("object not found") or something similar.

kadyb commented 4 years ago

I haven't checked if geocodePL_get() works after the changes yet. I will do it tomorrow.

BERENZ commented 4 years ago

Ok, please check if my proposal fits your requirements.

How should the unit test for wrong or no address look like? Should I use expect_true to check if output is equal to object not found ?

kadyb commented 4 years ago

How should the unit test for wrong or no address look like? Should I use expect_true to check if output is equal to object not found ?

In testthat there is expect_output() function for this, so expect_output(t6, "object not found").

kadyb commented 4 years ago
test = geocodePL_get(geoname = "Jeziorak")

doesn't work because of NULL in lists.

BERENZ commented 4 years ago

The error is because of different number of elements in each list

List of 5
 $ 1:List of 13
 $ 2:List of 12
 $ 3:List of 12
 $ 4:List of 13
 $ 5:List of 12

So using the standard do.call and rbind fails. Maybe we could use dplyr::bind_rows ?

kadyb commented 4 years ago
if (length(output) == 1) {
   output[[1]][sapply(output[[1]], is.null)] = NA
}

Maybe the length condition is too restrictive / unfounded? What if the output consists of multiple lists (like in https://github.com/kadyb/rgugik/pull/43#issuecomment-719909917)? Maybe we should test first if there is NULL in the lists. If so, then replace it in each list with NA.

kadyb commented 4 years ago

So using the standard do.call and rbind fails. Maybe we could use dplyr::bind_rows ?

I would not like to introduce a new dependency. The 13th extra column is "notes", maybe we'll just remove it in each list and then merge the lists?

BERENZ commented 4 years ago

Done. Why the checks are not passed?

kadyb commented 4 years ago

ERROR (test-emuia_download.R:19:1): (code run outside of test_that()) FAILURE (test-geocodePL_get.R:26:3): check if object was not found Warning (test-minmaxDTM_get.R:23:1): (code run outside of test_that())

1 and 3 I don't know why. It worked yesterday. 2 is related to:

test_that("check if object was not found", {
  expect_output(t6, "object not found")
})
kadyb commented 4 years ago

Maybe try change to expect_equal(t6, "object not found") because we didn't print object not found exactly (we use return("object not found")). This could be the reason.

BERENZ commented 4 years ago

Maybe expect_output is not the right choice?

> expect_output(t6, "object not found")
Error: `t6` produced no output

May I use expect_true(t6 == "object not found")?

EDIT: what is the difference between expect_true and expect_equal in this case? Which one is preferred?

kadyb commented 4 years ago

Probably expect_equal is preferable for comparing values. Indeed, maybe expect_true will be better.

kadyb commented 4 years ago

I think I can merge this PR now. Please also add your pearson to DESCRIPTION as ctb before "GUGiK". Thanks for the help.

If you still have time and you would like to, you can address:

  1. Remove unnecessary columns from the output (https://github.com/kadyb/rgugik/issues/11).
  2. Creating a helper (private) function to avoid duplicating the code.
BERENZ commented 4 years ago

Thank you! I will try to help with the development of this package.