Closed BERENZ closed 4 years ago
Also, I have not found such information anywhere. There are probably no such restrictions. However, in my experience, GUGiK's servers and services are problematic. I think the safe solution will be to set some interval (maybe 1 s?) between requests.
BTW: At this moment the geocodePL_get()
function needs some output improvements (#11).
Ok, I understand. Maybe you could contact GUGIK's staff to ask about the limitations?
BTW. is it possible that geocodePL_get()
may return sf
object instead of list
? That would be super useful for speeding up the processing and merging with other data?
OK, I will write a message asking if there are limits on the number of requests and the time between them.
Yes. This is a very good idea. There is definitely room for improvement. Currently we don't have time to do it, but I will definitely keep it in mind in the future.
Edit: I sent email.
Ok, so here is small proposal that combines the result of geocodePL_get
.
output <- geocodePL_get(address = "Marki")
if (sapply(output, length)[1] == 1) {
df <- as.data.frame(do.call(cbind, test), stringsAsFactors = FALSE)
df$geometry_wkt <- NULL
df <- st_as_sf(x = df, coords = c("x", "y"), crs = 2180)
} else {
df <- lapply(output, FUN = function(x) as.data.frame(do.call(cbind, x), stringsAsFactors = FALSE))
df <- do.call('rbind',df)
df$geometry_wkt <- NULL
df <- st_as_sf(x = df, coords = c("x", "y"), crs = 2180)
}
Here's how it works
> output <- geocodePL_get(address = "Marki") ## list of 10
> df[,1:5]
Simple feature collection with 10 features and 5 fields
geometry type: POINT
dimension: XY
bbox: xmin: 469003.1 ymin: 193553.4 xmax: 710402 ymax: 631605
CRS: EPSG:2180
city teryt simc voivodeship county geometry
1 Marki 100103 0538774 łódzkie bełchatowski POINT (523435.6 398347.3)
2 Marki 120702 0960993 małopolskie powiat limanowski POINT (576498.3 199686.8)
3 Marki 120709 0453724 małopolskie powiat limanowski POINT (583279.3 195401.8)
4 Marki 120711 0467212 małopolskie powiat limanowski POINT (597554.8 204842.4)
5 Marki 121508 0994934 małopolskie suski POINT (537196 193553.4)
6 Marki 143402 0920901 mazowieckie wołomiński POINT (644467.9 498763.1)
7 Marki 160804 0143432 opolskie powiat oleski POINT (469003.1 358536.7)
8 Marki 160804 0143366 opolskie oleski POINT (469243.6 358790.1)
9 Marki 182001 0787721 podkarpackie tarnobrzeski POINT (685860 289770)
10 Marki 200602 0397167 podlaskie kolneński POINT (710402 631605)
> output <- geocodePL_get(address = "Marki, Andersa")
> df[,1:5]
Simple feature collection with 1 feature and 5 fields
geometry type: POINT
dimension: XY
bbox: xmin: 643949.4 ymin: 499656.9 xmax: 643949.4 ymax: 499656.9
CRS: EPSG:2180
street teryt simc ulic city geometry
1 Andersa 143402 0920901 00285 Marki POINT (643949.4 499656.9)
> output <- geocodePL_get(rail_crossing = "001 018 478")
> df[,1:5]
Simple feature collection with 1 feature and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: 620704.5 ymin: 478258.4 xmax: 620704.5 ymax: 478258.4
CRS: EPSG:2180
operator category phone mobile phone geometry
1 PKP PLK WARSZAWA A +48 22 473 37 34 +48 600 084 183 POINT (620704.5 478258.4)
EDIT: if you like this proposal I may prepare PR with respect to geocodePL_get.R
and test-geocodePL_get.R
EDIT2: I don't know how to use element geometry_wkt
that contains sf object which may be a better idea than using coords = c("x","y")
.
I looked at your code (but I didn't test it). Maybe can we simplify it?
output = geocodePL_get(address = "Marki")
df_output = do.call(rbind.data.frame, output)
# use "geometry_wkt"
df_output = sf::st_as_sf(df_output, wkt = "geometry_wkt", crs = 2180)
Also, in geocodePL_get.R, we can remove
if (length(output) == 1) {
output = output[[1]]
}
so a nested list will always be returned, then we can drop length
condition (in your code) or just use rbind.data.frame
.
The question: what if any column (attribute) is empty (NULL)? Will the function even work? The next point is that we should only choose the relevant columns at the end (#11). One more thing, there will probably be some duplicate code, so we should create some helper function.
If you simplify then results with only one query give incorrect output, see below:
> output <- geocodePL_get(address = "Marki, Andersa")
> df_output <- do.call(rbind.data.frame, output)
> df_output
1 | Andersa |
---|---|
2 | 143402 |
3 | 0920901 |
4 | 00285 |
5 | Marki |
6 | 643949.3987 |
7 | 499656.945800001 |
8 | LINESTRING(643691.7537 499759.7709,643714.492 499753.1515,643768.427 499731.363399999,643801.4207 499717.6074,643827.3306 499706.843599999,643949.3987 499656.945800001,644044.1973 499614.359099999,644077.5194 499600.2992,644169.6761 499559.555500001,644200.1808 499546.196699999,644271.0002 499515.1812,644276.6037 499513.287) |
9 | 1 |
10 | 1 |
11 | {Marki,143402} |
> df_output = sf::st_as_sf(df_output, wkt = "geometry_wkt", crs = 2180)
Error in `[[<-.data.frame`(`*tmp*`, wkt, value = list()) :
replacement has 0 rows, data has 11
Concerning the NULL
results it may be verified before applying these lines?
EDIT: I noticed that geocodePL_get(rail_crossing = "001 018 478")
will give results without geometry_wkt
so we cannot use wkt = "geometry_wkt"
in sf::st_as_sf
.
I think we should remove
if (length(output) == 1) {
output = output[[1]]
}
in source code and then use rbind.data.frame
, because it will be a nested list.
But I can be wrong.
You check NULLs after
output = jsonlite::fromJSON(prepared_URL)[["results"]]
EDIT: I noticed that geocodePL_get(rail_crossing = "001 018 478") will give results without geometry_wkt so we cannot use wkt = "geometry_wkt" in sf::st_as_sf.
There is probably geometry_wkt
attribute, just we're not returning it on the output currently.
https://github.com/kadyb/rgugik/blob/5e01945990da277cea72772194d9d5397faa6a36/R/geocodePL_get.R#L69-L71
Ok, I will go back with some improvements to the end of this week.
Response from GUGiK:
W odpowiedzi na Pańskie pytanie informuję, że w usłudze wprowadzony jest mechanizm blokowania adresów IP, który jest uruchamiany w wyniku przesyłania masowej ilości zapytań do źródłowego serwera usługi. Ograniczenie to ma na celu ochronę usługi na poziomie aplikacyjnym przed nadmierną ilością zapytań wysyłanych od użytkownika, w szczególności ataków DDoS.
W przypadku gdyby na Pański adres IP została nałożona taka blokada, wówczas należy postępować zgodnie z wyświetlonym komunikatem.
So we don't know what the limit is, but I think we can assume that there should be a 1 second delay between requests. If the limit is exceeded, the function will stop working (there will be an error in fromJSON()
).
Fixed in https://github.com/kadyb/rgugik/pull/43.
Is there a limit for the number of / time between queries for geocoding using
geocodePL_get
? I tried to find this information on GUGIK webpage but I failed.