joelgombin / banR

R client for the BAN API
http://joelgombin.github.io/banR/
GNU General Public License v3.0
28 stars 10 forks source link

Inconsistency in calculating geocoding scores #30

Closed Sade154 closed 3 years ago

Sade154 commented 3 years ago

Too big difference in score when geocoding addresses containing "bis" vs "b". Please see the example below (which reveals an inconsistant behavior in the scoring) :


library(banR)
library(dplyr)

# score : 0.708 
geocode(query = "3 bis rue La Bruyère, 75009 Paris") %>%
  glimpse()

Rows: 5
Columns: 18
$ label       <chr> "3 b Rue la Bruyère 75009 Paris", "Square la Bruyère 75009 Paris", "3 b Rue Bleue 75009 Paris", ...
$ score       <dbl> 0.7080726, 0.5166771, 0.5044495, 0.4707309, 0.4683309
$ housenumber <chr> "3 b", NA, "3 b", NA, NA
$ id          <chr> "75109_5211_00003_b", "75109_5212", "75109_1017_00003_b", "75109_1345", "75109_1434"
$ name        <chr> "3 b Rue la Bruyère", "Square la Bruyère", "3 b Rue Bleue", "Rue de Bruxelles", "Rue de Calais"
$ postcode    <chr> "75009", "75009", "75009", "75009", "75009"
$ citycode    <chr> "75109", "75109", "75109", "75109", "75109"
$ x           <dbl> 651344.9, 651065.6, 652137.1, 650901.9, 650924.1
$ y           <dbl> 6864521, 6864555, 6864170, 6865026, 6864926
$ city        <chr> "Paris", "Paris", "Paris", "Paris", "Paris"
$ district    <chr> "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arron...
$ context     <chr> "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, ...
$ type        <chr> "housenumber", "street", "housenumber", "street", "street"
$ importance  <dbl> 0.69789, 0.57534, 0.66323, 0.67804, 0.65164
$ street      <chr> "Rue la Bruyère", NA, "Rue Bleue", NA, NA
$ type_geo    <chr> "Point", "Point", "Point", "Point", "Point"
$ longitude   <dbl> 2.336596, 2.332783, 2.347436, 2.330497, 2.330811
$ latitude    <dbl> 48.87887, 48.87915, 48.87577, 48.88338, 48.88248

# score : 0.799
geocode(query = "3 b rue La Bruyère, 75009 Paris") %>%
  glimpse()

Rows: 5
Columns: 18
$ label       <chr> "3 b Rue la Bruyère 75009 Paris", "3 b Rue Bleue 75009 Paris", "Square la Bruyère 75009 Paris", ...
$ score       <dbl> 0.7998082, 0.5716573, 0.5432127, 0.4940360, 0.4896602
$ housenumber <chr> "3 b", "3 b", NA, "3 b", "3 b"
$ id          <chr> "75109_5211_00003_b", "75109_1017_00003_b", "75109_5212", "75109_1407_00003_b", "75109_1363_0000...
$ name        <chr> "3 b Rue la Bruyère", "3 b Rue Bleue", "Square la Bruyère", "3 b Rue Cadet", "3 b Rue de Budapest"
$ postcode    <chr> "75009", "75009", "75009", "75009", "75009"
$ citycode    <chr> "75109", "75109", "75109", "75109", "75109"
$ x           <dbl> 651344.9, 652137.1, 651065.6, 651777.3, 650678.1
$ y           <dbl> 6864521, 6864170, 6864555, 6864028, 6864230
$ city        <chr> "Paris", "Paris", "Paris", "Paris", "Paris"
$ district    <chr> "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arron...
$ context     <chr> "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, ...
$ type        <chr> "housenumber", "housenumber", "street", "housenumber", "housenumber"
$ importance  <dbl> 0.69789, 0.66323, 0.57534, 0.66969, 0.64942
$ street      <chr> "Rue la Bruyère", "Rue Bleue", NA, "Rue Cadet", "Rue de Budapest"
$ type_geo    <chr> "Point", "Point", "Point", "Point", "Point"
$ longitude   <dbl> 2.336596, 2.347436, 2.332783, 2.342547, 2.327538
$ latitude    <dbl> 48.87887, 48.87577, 48.87915, 48.87447, 48.87620
joelgombin commented 3 years ago

Thanks for the issue, this question should be asked upstream as banR is just a thin wrapper for the https://geo.api.gouv.fr/adresse API. Please open an issue here: https://github.com/etalab/api-geo/issues