ropensci / parzer

Parse geographic coordinates
https://docs.ropensci.org/parzer
Other
63 stars 6 forks source link

Split string with lat and lon and sparse them. #24

Closed AlbanSagouis closed 4 years ago

AlbanSagouis commented 4 years ago

Description

pz_split_llstr_string splits one string where latitude and longitude are separated by , ; . or a space character. pz_split_llstr takes a vector of strings and calls pz_split_llstr_string to split them.

R function parse_llstr (that existed already) calls the c++ function pz_split_llstr, applies parsing to latitude and longitude values with parse_lat and parse_lon, and returns a data.frame with lat and lon in decimal degrees.

Related Issue

This fix addresses issue #3

Usage

parse_llstr(c("N4'55'33', E112'13'40'",
                    "N4'55'33'; E112'13'40'",
                    "N4'55'33' E112'13'40'"))

       lat      lon
1 4.925833 112.2278
2 4.925833 112.2278
3 4.925833 112.2278
sckott commented 4 years ago

thanks @AlbanSagouis - having a look

sckott commented 4 years ago

For pull requests, I prefer if they are concentrated on a single topic. scrub is a separate issue, and I think requires a some discussion. Can you remove the scrub related changes, and put on a separate branch and PR?

sckott commented 4 years ago

Thanks @AlbanSagouis - two items:

AlbanSagouis commented 4 years ago

Hi @sckott , My previous commit addresses the issues arising when several spaces were present. Now the function deletes any extra spaces and the c++ line giving the error you showed (see below) was deleted.

pz_split_llstr.cpp:19:54: warning: multi-character character constant [-Wmultichar]
    boost::split(splitstr, x, [](char c){return c == ', ';});

I'll look into replacing boost functions with std functions. I'm thinking of using str.find to look for the separator and str.substr to select coordinate elements OR std::regex_replace

AlbanSagouis commented 4 years ago

Hi Scott, My last commit made the code not dependent on boost anymore. I used std::regex_replace to alternatively delete latitude or longitude and pick up coordinates separately.

sckott commented 4 years ago

thanks, i'll have another look at the code locally

sckott commented 4 years ago

Thanks for fixing that warning, and removing Boost, that should make installation easier when users are installing development versions

It looks like parse_llstr() is sensitive to order of inputs. Do you think it could work if the user gives longitude first in the input string? For example: E 101.5823, N 04.1683

AlbanSagouis commented 4 years ago

It looks like parse_llstr() is sensitive to order of inputs. Do you think it could work if the user gives longitude first in the input string? For example: E 101.5823, N 04.1683

Hi, It is definitely sensitive to the order. If there are letters in the coordinate string, we could add the feature but if not, it gets tricky to make sure latitude is first.

In my original implementation, order was checked based on the letters. When no letter was given, there was an argument for that: if assume_good_order argument was TRUE, latitude was considered first. When assume_good_order was FALSE, the function would return NA whenever there was no letter to properly indicate the order.

We could also add an argument to let the user set the order. order = 'latlon' or 'lonlat'

sckott commented 4 years ago

Thanks for the fixes.

Let's move on now wrt the order of inputs - we can return to that later if there's users that want to be able to control order of lat and lon in strings.