rinigus / geocoder-nlp

Geocoder library based on libpostal normalization of libosmscout generated database
MIT License
21 stars 1 forks source link

Key did not exist error on rebuilding libpostal model #63

Closed TurboCar closed 2 years ago

TurboCar commented 2 years ago

I found a "Key did not exist" error when trying to leverage the script under postal folder to rebuild new libpostal model that includes multiple countries. Does anyone run into the same problem? thanks!

Screen Shot 2022-03-22 at 13 12 26

rinigus commented 2 years ago

Please specify in details what you did and how to reproduce it

TurboCar commented 2 years ago

Thanks for your replying. The goal of retraining is to improve the accuracy of full-size model on UK addresses. Here are my steps:

  1. Append two bad UK addresses into uk_openaddresses_formatted_addresses_tagged.random.tsv. for example,

en gb Santoga/house Auto/house Ellis/road Ashton/road Street/road Huyton/suburb Liverpool/city L55/postcode 6BK/postcode United/country Kingdom/country en gb 1500/house_number Eureka/road Park/road Lower/suburb Pemberton/suburb Ashford/city Kent/state_district TN25/postcode 4BF/postcode United/country Kingdom/country

  1. Modified build_country_db.sh to re-train full-size mode based on original datasets. The code changes like:

Screen Shot 2022-03-23 at 07 53 00

  1. Run the bash script on a VM machine with 144GB memory & 700 GB free disk. I just found the 'Key did not exist' error every time there were 5,400,000 added to trie. If run the script with single country, i.e., GB only, there was no any issue found.
rinigus commented 2 years ago

But that sounds like libpostal bug, if anything. Assuming that I understood it right

rinigus commented 2 years ago

Closing over here, assuming that it is not an issue of the geocoder