ropensci / parzer

Parse geographic coordinates
https://docs.ropensci.org/parzer
Other
63 stars 6 forks source link

degrees symbols not processed #40

Closed ncotie closed 2 years ago

ncotie commented 2 years ago

Hi, It would be a nice addition if it could help degree symbols, e.g. ° or º, it appears not to right now. Cheers Neil

AlbanSagouis commented 2 years ago

Hi Neil, The scrub function is supposed to replace all these characters resembling the degree symbol by a fixed character. Would you have an example of a string where it does not, please? Thanks, Alban

ncotie commented 2 years ago

Hi Alban, A copy/paste of a part of what I was trying to use parzer on is 3°42'43.91"O 40°25'25.98"N, for example. I was looking around for packages to process this into decimal degrees, but have since deleted the exact code I was using as I ended up using another package. However, as per my memory I was trying to use it like in the example below from your README, feeding df columns to parse_lat and parse_long.
parse_lat(c("45N54.2356", "-45.98739874", "40.123°"))

> [1] 45.90393 -45.98740 40.12300

When you refer to the scrub function, I didn't see any information about that in what I read about parzer. If I should have been using that in my code, I wasn't aware of it, sorry.

In case you want the full input file I was using, it's from https://datos.madrid.es/egob/catalogo/212629-0-estaciones-control-aire.xls These data do have two different degree symbols in the location seen in the copy/paste above, both ° and º. (They may look the same, but one is a bit bigger than the other. I didn't know there were two versions of the character...) FWIW... Regards, Neil

AlbanSagouis commented 2 years ago

Hi Neil,

Thanks a lot for the detailed answer! No worries about scrub, it is not directly accessible to the user, it is used by all other functions before parsing. To access scrub yourself, you would need to write parzer:::scrub() with three :. The function of scrub is to turn every special characters into ', that includes all the special characters that look like the degree symbol, and there are a few. So if you observe that it is not doing that job, there is a problem!

Thanks for providing the data you used it on, I had a look, may be I found the problem you had and I propose a solution:

download.file(
    url = "https://datos.madrid.es/egob/catalogo/212629-0-estaciones-control-aire.xls",
    destfile = paste0(tempdir(), "Spanish_coordinates.xsl"), mode = "wb"
)
dt <- readxl::read_xls(paste0(tempdir(), "Spanish_coordinates.xsl"), range = "E1:F24")

parzer::parse_lat(dt$LATITUD_ETRS89)
parzer::parse_lon(dt$LONGITUD_ETRS89)
# does not work because parzer only recognizes English codes (NSEW), I had that issue with French data before
parzer::parse_lon(gsub("O", "W", dt$LONGITUD_ETRS89)) # a solution is to replace the O with a W

Do you think that was your problem? I opened an issue about parzer not recognising these codes but I don't know if adding exceptions like "O" would not make the program a little more error prone. At least there should be an error message!

ncotie commented 2 years ago

Hi Alban, Yes, I was aware of the O needing changing, but right now I do not remember if I had done that before attempting parzer on the data. Possibly not. But if that was the problem it could make some sense.
In my own view, I wouldn't expect something like parzer to understand the "O" and switch it to "W".

OK, I went and retested it inside my code, after where I had gsub'ed W in to replace O, and indeed parzer::parse_lat and parse_lon work fine. So I guess I must have been using it before I had done the gsub, without considering that it would get blocked on the O. Indeed, I think I was expecting it to just handle the numeric portion.

So it appears I've wasted your time.
Apologies for that. Regards Neil