stuartemiddleton / geoparsepy

geoparsepy is a Python geoparsing library that will extract and disambiguate locations from text. It uses a local OpenStreetMap database which allows very high and unlimited geoparsing throughput, unlike approaches that use a third-party geocoding service (e.g. Google Geocoding API). this repository holds Python examples to use the PyPI library.
Other
54 stars 4 forks source link

error in reading postgresql database #1

Closed flppgg closed 4 years ago

flppgg commented 4 years ago

I am trying to replicate the example as explained in the docs.

However, I am having issues in extracting all files from geoparsepy_preprocessed_tables.tar.gz (using 7zip on windows): I can only extract global_cities.sql and europe_places.sql.

I then create the PostgreSQL database following the example, and launch the python example file. The script fails here:

cached_locations = geoparsepy.geo_preprocess_lib.cache_preprocessed_locations( databaseHandle, dictLocationIDs, 'public', dictGeospatialConfig )

This is the traceback:

logging started
loading stoplist from C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-geo-stoplist-en.txt
loading whitelist from C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-geo-whitelist.txt
loading blacklist from C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-geo-blacklist.txt
loading building types from C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-buildingtype-en.txt
loading location type corpus C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-buildingtype-en.txt
- 3 unique titles
- 76 unique types
loading street types from C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-streettype-en.txt
loading location type corpus C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-streettype-en.txt
- 15 unique titles
- 32 unique types
loading admin types from C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-admintype-en.txt
loading location type corpus C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\corpus-admintype-en.txt
- 14 unique titles
- 0 unique types
loading gazeteer from C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\gazeteer-en.txt
caching locations : {'europe_places_admin': [-1, -1], 'europe_places_poly': [-1, -1], 'europe_places_line': [-1, -1], 'europe_places_point': [-1, -1], 'global_cities_admin': [-1, -1], 'global_cities_poly': [-1, -1], 'global_cities_line': [-1, -1], 'global_cities_point': [-1, -1]}
Traceback (most recent call last):
  File "C:\Users\**\20200819_geoparsepy_example.py", line 29, in <module>
    cached_locations = geoparsepy.geo_preprocess_lib.cache_preprocessed_locations( databaseHandle, dictLocationIDs, 'public', dictGeospatialConfig )
  File "C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\geoparsepy\geo_preprocess_lib.py", line 1649, in cache_preprocessed_locations
    listRows = database_handle.execute_sql_query_batch( listSQL, timeout_statement, timeout_overall )
  File "C:\Users\**\AppData\Local\Programs\Python\Python37\lib\site-packages\soton_corenlppy\PostgresqlHandler.py", line 348, in execute_sql_query_batch
    raise Exception( 'SQL query failed (timeout retrying) : ' + strLastError + ' : ' + tupleStatement[0] )
Exception: SQL query failed (timeout retrying) : ['42P01'] UndefinedTable('relation "public.europe_places_admin" does not exist\nLINE 1: ...gions,ST_AsText(geom),hstore_to_matrix(tags) FROM public.eur...\n                                                             ^\n') : SELECT concat('europe_places_admin_',loc_id),name,osm_id_set,admin_regions,ST_AsText(geom),hstore_to_matrix(tags) FROM public.europe_places_admin
stuartemiddleton commented 4 years ago

Thanks for reporting this issue. It looks like there was a new 1 Gbyte limit on downloads from the Southampton webspace I used to share the zipped SQL dump file, so downloads were truncated.

I have uploaded the zip file to Google drive and added a new download link to the readme instructions.

Can you try and download again and let me know if it works please?

flppgg commented 4 years ago

thank you for your reply, I have just tried it, and I now get this error:

postgres=#` \i 'C:/Users/**/uk_places.sql'
psql:C:/Users/**/uk_places.sql:2: ERROR:  syntax error at or near "ÿ_"
LINE 1: ÿ_-=:-ON"5_L "AsxMLYNgNF:E".e-=bBæ]%»ch%æarK"s-i=e=aEixVExEM…

Unfortunately I do not know much about postgreSQL, but I suspect there could be an error with encoding? It looks similar to this:

In any case, I am unzipping with 7zip on windows

stuartemiddleton commented 4 years ago

Your problem is that the SQL dump files I uploaded were unicode (UTF-16) not UTF-8 encoded.

To avoid this confuson for others I have convered the SQL dump to UTF-8 encoded text and uploaded a new zip file (7zip using its tar.zip format) to my google drive. I have also added the Win10 powershell SQL input commands as well as the Ubuntu version to the readme markdown file.

Can you download the new zip file and try again please.

flppgg commented 4 years ago

thank you, it works now!

johannessweater commented 1 year ago

Hi -- I'm hoping to use geoparsepy for a paper, but I've run into the same error message as the OP describes when I run the example script. The error also occurs at this line:

cached_locations = geoparsepy.geo_preprocess_lib.cache_preprocessed_locations( databaseHandle, dictLocationIDs, 'public', dictGeospatialConfig )

Here is the traceback:

logging started
loading stoplist from /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-geo-stoplist-en.txt
loading whitelist from /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-geo-whitelist.txt
loading blacklist from /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-geo-blacklist.txt
loading building types from /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-buildingtype-en.txt
loading location type corpus /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-buildingtype-en.txt
- 3 unique titles
- 76 unique types
loading street types from /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-streettype-en.txt
loading location type corpus /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-streettype-en.txt
- 15 unique titles
- 32 unique types
loading admin types from /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-admintype-en.txt
loading location type corpus /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/corpus-admintype-en.txt
- 14 unique titles
- 0 unique types
loading gazeteer from /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/gazeteer-en.txt
caching locations : {'global_cities_admin': [-1, -1], 'global_cities_poly': [-1, -1], 'global_cities_line': [-1, -1], 'global_cities_point': [-1, -1], 'europe_places_admin': [-1, -1], 'europe_places_poly': [-1, -1], 'europe_places_line': [-1, -1], 'europe_places_point': [-1, -1], 'north_america_places_admin': [-1, -1], 'north_america_places_poly': [-1, -1], 'north_america_places_line': [-1, -1], 'north_america_places_point': [-1, -1], 'uk_places_admin': [-1, -1], 'uk_places_poly': [-1, -1], 'uk_places_line': [-1, -1], 'uk_places_point': [-1, -1]}
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Input In [7], in <cell line: 25>()
     22     dictLocationIDs[strFocusArea + '_line'] = [-1,-1]
     23     dictLocationIDs[strFocusArea + '_point'] = [-1,-1]
---> 25 cached_locations = geoparsepy.geo_preprocess_lib.cache_preprocessed_locations( databaseHandle, dictLocationIDs, 'public', dictGeospatialConfig )
     26 logger.info( 'number of cached locations = ' + str(len(cached_locations)) )
     28 databaseHandle.close()

File /opt/anaconda3/lib/python3.9/site-packages/geoparsepy/geo_preprocess_lib.py:1649, in cache_preprocessed_locations(database_handle, location_ids, schema, geospatial_config, timeout_statement, timeout_overall, spatial_filter)
   1646     listSQL.append( (strSQLQuery, tupleSQLParams) )
   1648 # note: SQL returns UTF8 encoded <str> objects. to get <unicode> use unicode( strText, 'utf8' )
-> 1649 listRows = database_handle.execute_sql_query_batch( listSQL, timeout_statement, timeout_overall )
   1651 listResult = []
   1652 if len(listRows) > 0 :

File /opt/anaconda3/lib/python3.9/site-packages/soton_corenlppy/PostgresqlHandler.py:348, in PostgresqlHandler.execute_sql_query_batch(self, query_list, timeout_statement, timeout_overall)
    345     time.sleep( 1 )
    347 # failure
--> 348 raise Exception( 'SQL query failed (timeout retrying) : ' + strLastError + ' : ' + tupleStatement[0] )

Exception: SQL query failed (timeout retrying) : ['42P01'] UndefinedTable('relation "public.global_cities_admin" does not exist\nLINE 1: ...gions,ST_AsText(geom),hstore_to_matrix(tags) FROM public.glo...\n                                                             ^\n') : SELECT concat('global_cities_admin_',loc_id),name,osm_id_set,admin_regions,ST_AsText(geom),hstore_to_matrix(tags) FROM public.global_cities_admin
stuartemiddleton commented 1 year ago

@johannessweater if you look at the error its an undefined table. this means SQL database for global_cities_admin was not installed correctly. the SQL database install commands are provided in the readme, and these have worked for others (so maybe look at your local Postgres setup if it does not work for you).

johannessweater commented 1 year ago

@stuartemiddleton Ah! Thank you for your reply!