phacochr / phacochr

Géocodeur pour la Belgique
9 stars 1 forks source link

regression in result quality #7

Open joostschouppe opened 2 months ago

joostschouppe commented 2 months ago

In recent geocoding on data that I geocoded before, I see a regression in quality. It seems to be related to the language of the street name. If you are in Brussels and give a Dutch streetname, or in the German speaking area and give a German street name, the results seem to be a lot worse then when you give a French streetname. This was not an issue a year or so ago. In one dataset, I have Brussels addresses with both the Dutch and the French streetname. They have a 8% success rate in Dutch and 96% success in French.

hugoperilleux commented 1 month ago

Thank you for your feedback. Sorry for the inconvenience! We've restored phacochr to its previous version (0.9.14). If you do:

devtools::install_github(‘phacochr/phacochr’)
library(phacochr)
phaco_setup_data()

and optionally:

phaco_best_data_update()

it should work as before.

Also, would you be willing to share your address set with me so that I can try it out (hugo.perilleux@ulb.be)?

joostschouppe commented 1 month ago

Yes, this solves the issue. Here are some test results:

Some Dutch in Brussels, some German in Ostbelgien Region n Valid rue(%) Rue detect.(%valid) Approx.(n) Elarg.(n) Mid.(n) Geocode(%valid) Geocode(%tot)
Bruxelles 26 100 100.0 1 0 0 100.0 100.0
Flandre 70 100 97.1 3 0 2 97.1 97.1
Wallonie 54 100 90.7 6 1 0 88.9 88.9
NA 2 100 0.0 0 0 0 0.0 0.0
Total 152 100 94.1 10 1 2 93.4 93.4

(here the results were about 50% in Brussels before)

French where available Region n Valid rue(%) Rue detect.(%valid) Approx.(n) Elarg.(n) Mid.(n) Geocode(%valid) Geocode(%tot)
Bruxelles 26 100 96.2 1 0 0 96.2 96.2
Flandre 70 100 97.1 3 0 2 97.1 97.1
Wallonie 54 100 90.7 6 1 0 88.9 88.9
NA 2 100 0.0 0 0 0 0.0 0.0
Total 152 100 93.4 10 1 2 92.8 92.8