podaac / SOTO

State of the Ocean (SOTO)
3 stars 0 forks source link

Error while loading SWORD v16 data #28

Closed frankinspace closed 7 months ago

frankinspace commented 7 months ago

Summary

An error occurred when trying to load the SWORD v16 data. The error log is appended to the end of this file. The error indicated a problem trying to persist ‘inf’ values into the MySQL database because ‘inf’ values are not a supported data type. Upon inspection, there was only one attribute (sinuosity) in about 20 of the node files that contained ‘inf’ values. After checking with applied science team, it was determined that replacing these ‘inf’ values with ’NaN’ would not be a misrepresentation of the meaning behind the data. Since MySQL supports ‘NaN’ values, a script was written to convert the ‘inf’ values in SWORD v16 to ’NaN’ values. The resulting data was then loaded into the FTS database.

Bucket Contents

Description of the objects present in the s3://sword-v16-shp bucket (services UAT)

SWORD_v16_shp.zip

The original SWORD file downloaded from gaia.geosci.unc.edu/SWORD/SWORD_v16_shp.zip.

shp/

This is the unzipped SWORD_v16_shp.zip with the ‘inf’ -> ’NaN’ conversion already applied.

sword_convert_inf.py

This is the script used to convert ‘inf’ values to ‘NaN’ values.

Error Log

Error log from original load attempt has been copied from cloud watch below:

INFO:podaac.sworddb.setup_sword:Sword S3 data pulled successfully
INFO:podaac.sworddb.setup_sword:Loading nodes
INFO:podaac.sworddb.setup_sword:Dropping node table
INFO:podaac.sworddb.setup_sword:Node table dropped
/home/dockeruser/.local/lib/python3.8/site-packages/geopandas/geodataframe.py:1321: UserWarning: Geometry column does not contain geometry.
warnings.warn("Geometry column does not contain geometry.")
INFO:podaac.sworddb.setup_sword:Current node file: internal/SWORD/Reaches_Nodes/shp/EU/eu_sword_nodes_hb21_v16.shp
INFO:podaac.sworddb.setup_sword:0%| | 0/75141 [00:00<?, ?it/s]
INFO:podaac.sworddb.setup_sword:10%|9 | 7514/75141 [00:01<00:11, 6035.22it/s]
INFO:podaac.sworddb.setup_sword:20%|#9 | 15028/75141 [00:02<00:07, 7682.55it/s]
INFO:podaac.sworddb.setup_sword:30%|##9 | 22542/75141 [00:02<00:06, 8373.63it/s]
INFO:podaac.sworddb.setup_sword:40%|###9 | 30056/75141 [00:03<00:05, 8839.30it/s]
INFO:podaac.sworddb.setup_sword:50%|####9 | 37570/75141 [00:04<00:04, 8960.80it/s]
INFO:podaac.sworddb.setup_sword:60%|#####9 | 45084/75141 [00:05<00:03, 8923.45it/s]
INFO:podaac.sworddb.setup_sword:70%|######9 | 52598/75141 [00:06<00:02, 9046.01it/s]
INFO:podaac.sworddb.setup_sword:80%|#######9 | 60112/75141 [00:06<00:01, 9174.87it/s]
INFO:podaac.sworddb.setup_sword:90%|########9 | 67626/75141 [00:07<00:00, 9129.89it/s]
INFO:podaac.sworddb.setup_sword:100%|#########9| 75140/75141 [00:08<00:00, 9125.74it/s]
INFO:podaac.sworddb.setup_sword:82654it [00:08, 9662.71it/s]
/home/dockeruser/.local/lib/python3.8/site-packages/geopandas/geodataframe.py:1321: UserWarning: Geometry column does not contain geometry.
warnings.warn("Geometry column does not contain geometry.")
INFO:podaac.sworddb.setup_sword:Current node file: internal/SWORD/Reaches_Nodes/shp/EU/eu_sword_nodes_hb22_v16.shp
INFO:podaac.sworddb.setup_sword:0%| | 0/234411 [00:00<?, ?it/s]
INFO:podaac.sworddb.setup_sword:10%|9 | 23441/234411 [00:02<00:22, 9240.93it/s]
INFO:podaac.sworddb.setup_sword:20%|#9 | 46882/234411 [00:04<00:19, 9416.96it/s]
INFO:podaac.sworddb.setup_sword:30%|##9 | 70323/234411 [00:07<00:17, 9410.85it/s]
INFO:podaac.sworddb.setup_sword:40%|###9 | 93764/234411 [00:09<00:14, 9377.79it/s]
INFO:podaac.sworddb.setup_sword:40%|###9 | 93764/234411 [00:11<00:17, 8019.29it/s]
Traceback (most recent call last):
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1782, in _execute_context
self.dialect.do_executemany(
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 193, in do_executemany
rowcount = cursor.executemany(statement, parameters)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 173, in executemany
return self._do_execute_many(
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 202, in _do_execute_many
v = values % escape(arg, conn)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 109, in _escape_args
return {key: conn.literal(val) for (key, val) in args.items()}
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 109, in <dictcomp>
return {key: conn.literal(val) for (key, val) in args.items()}
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in literal
return self.escape(obj, self.encoders)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/connections.py", line 510, in escape
return converters.escape_item(obj, self.charset, mapping=mapping)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/converters.py", line 25, in escape_item
val = encoder(val, mapping)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/converters.py", line 60, in escape_float
raise ProgrammingError("%s can not be used with MySQL" % s)
pymysql.err.ProgrammingError: inf can not be used with MySQL
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/dockeruser/.local/bin/run_sword", line 8, in <module>
sys.exit(main())
File "/home/dockeruser/.local/lib/python3.8/site-packages/podaac/sworddb/setup_sword.py", line 660, in main
load_nodes(engine, local_sword_path)
File "/home/dockeruser/.local/lib/python3.8/site-packages/podaac/sworddb/setup_sword.py", line 474, in load_nodes
insert_into_db(
File "/home/dockeruser/.local/lib/python3.8/site-packages/podaac/sworddb/setup_sword.py", line 119, in insert_into_db
insert_with_progress(data_frame, tbl_name, engine, datatype)
File "/home/dockeruser/.local/lib/python3.8/site-packages/podaac/sworddb/setup_sword.py", line 110, in insert_with_progress
cdf.to_sql(
File "/home/dockeruser/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 2605, in to_sql
sql.to_sql(
File "/home/dockeruser/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 589, in to_sql
pandas_sql.to_sql(
File "/home/dockeruser/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 1406, in to_sql
raise err
File "/home/dockeruser/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 1398, in to_sql
table.insert(chunksize, method=method)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 830, in insert
exec_insert(conn, keys, chunk_iter)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 747, in _execute_insert
conn.execute(self.table.insert(), data)
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
return connection._execute_clauseelement(
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
ret = self._execute_context(
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
self._handle_dbapi_exception(
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
util.raise_(
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
raise exception
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1782, in _execute_context
self.dialect.do_executemany(
File "/home/dockeruser/.local/lib/python3.8/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 193, in do_executemany
rowcount = cursor.executemany(statement, parameters)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 173, in executemany
return self._do_execute_many(
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 202, in _do_execute_many
v = values % escape(arg, conn)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 109, in _escape_args
return {key: conn.literal(val) for (key, val) in args.items()}
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/cursors.py", line 109, in <dictcomp>
return {key: conn.literal(val) for (key, val) in args.items()}
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/connections.py", line 517, in literal
return self.escape(obj, self.encoders)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/connections.py", line 510, in escape
return converters.escape_item(obj, self.charset, mapping=mapping)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/converters.py", line 25, in escape_item
val = encoder(val, mapping)
File "/home/dockeruser/.local/lib/python3.8/site-packages/pymysql/converters.py", line 60, in escape_float
raise ProgrammingError("%s can not be used with MySQL" % s)
sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) inf can not be used with MySQL
[SQL: INSERT INTO nodes (x, y, node_id, node_len, reach_id, wse, wse_var, width, width_var, facc, n_chan_max, n_chan_mod, obstr_type, grod_id, hfalls_id, dist_out, lakeflag, max_width, manual_add, meand_len, sinuosity, type, river_name, edit_flag, trib_flag, geometry, geojson, shp_origin, netcdf_origin) VALUES (%(x)s, %(y)s, %(node_id)s, %(node_len)s, %(reach_id)s, %(wse)s, %(wse_var)s, %(width)s, %(width_var)s, %(facc)s, %(n_chan_max)s, %(n_chan_mod)s, %(obstr_type)s, %(grod_id)s, %(hfalls_id)s, %(dist_out)s, %(lakeflag)s, %(max_width)s, %(manual_add)s, %(meand_len)s, %(sinuosity)s, %(type)s, %(river_name)s, %(edit_flag)s, %(trib_flag)s, %(geometry)s, %(geojson)s, %(shp_origin)s, %(netcdf_origin)s)]
[parameters: ({'x': 35.54928024096754, 'y': 48.973816972402396, 'node_id': 22513000310571, 'node_len': 182.37663400711062, 'reach_id': 22513000311, 'wse': 86.70000457763672, 'wse_var': 0.0, 'width': 48.0, 'width_var': 63.88888888888889, 'facc': 31062.407001929063, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 767350.0, 'lakeflag': 0, 'max_width': 48.0, 'manual_add': 0, 'meand_len': 926.6898868376883, 'sinuosity': 1.00460961813753, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.54928024096754 48.9738169724024)', 'geojson': '{"type": "Point", "coordinates": [35.54928024096754, 48.9738169724024]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 35.5519247860988, 'y': 48.973416584529545, 'node_id': 22513000310581, 'node_len': 221.02649269694666, 'reach_id': 22513000311, 'wse': 86.70000457763672, 'wse_var': 0.0, 'width': 48.0, 'width_var': 130.53061224489795, 'facc': 30911.348018808712, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 767571.0, 'lakeflag': 0, 'max_width': 48.0, 'manual_add': 0, 'meand_len': 821.4079780260256, 'sinuosity': 1.002259282897906, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.5519247860988 48.97341658452955)', 'geojson': '{"type": "Point", "coordinates": [35.5519247860988, 48.97341658452955]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 35.554561939095144, 'y': 48.9728723507765, 'node_id': 22513000310591, 'node_len': 182.97075114771178, 'reach_id': 22513000311, 'wse': 86.70000457763672, 'wse_var': 0.0, 'width': 60.0, 'width_var': 21.555555555555557, 'facc': 30735.11253850163, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 767754.0, 'lakeflag': 0, 'max_width': 60.0, 'manual_add': 0, 'meand_len': 803.8609932240819, 'sinuosity': 1.001867560357969, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.55456193909514 48.9728723507765)', 'geojson': '{"type": "Point", "coordinates": [35.55456193909514, 48.9728723507765]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 35.5572033652898, 'y': 48.97241278782153, 'node_id': 22513000310601, 'node_len': 219.30962324593565, 'reach_id': 22513000311, 'wse': 86.70000457763672, 'wse_var': 0.0, 'width': 48.0, 'width_var': 103.9591836734694, 'facc': 30584.05355538128, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 767973.0, 'lakeflag': 0, 'max_width': 48.0, 'manual_add': 0, 'meand_len': 1850.1714527568824, 'sinuosity': 1.002052191530238, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.5572033652898 48.97241278782153)', 'geojson': '{"type": "Point", "coordinates": [35.5572033652898, 48.97241278782153]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 35.559843889151956, 'y': 48.971936477036564, 'node_id': 22513000310611, 'node_len': 184.6876220486938, 'reach_id': 22513000311, 'wse': 86.70000457763672, 'wse_var': 0.0, 'width': 65.0, 'width_var': 131.13888888888889, 'facc': 30407.818075074203, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 768158.0, 'lakeflag': 0, 'max_width': 65.0, 'manual_add': 0, 'meand_len': 2024.5565293456827, 'sinuosity': 1.002082963392283, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.55984388915196 48.97193647703656)', 'geojson': '{"type": "Point", "coordinates": [35.55984388915196, 48.97193647703656]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 35.562483699477916, 'y': 48.9714472670169, 'node_id': 22513000310621, 'node_len': 217.59275381207772, 'reach_id': 22513000311, 'wse': 86.70000457763672, 'wse_var': 0.0, 'width': 48.0, 'width_var': 141.26530612244898, 'facc': 30256.75909195385, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 768376.0, 'lakeflag': 0, 'max_width': 48.0, 'manual_add': 0, 'meand_len': 2024.5565293456825, 'sinuosity': 1.002082963392283, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.56248369947792 48.9714472670169)', 'geojson': '{"type": "Point", "coordinates": [35.56248369947792, 48.9714472670169]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 35.5651302471697, 'y': 48.97109023078187, 'node_id': 22513000310631, 'node_len': 182.90527341880443, 'reach_id': 22513000311, 'wse': 86.55000305175781, 'wse_var': 0.022500457766, 'width': 54.0, 'width_var': 80.22222222222221, 'facc': 30080.52361164677, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 768559.0, 'lakeflag': 0, 'max_width': 54.0, 'manual_add': 0, 'meand_len': 2024.5565293456827, 'sinuosity': 1.002082963392283, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.5651302471697 48.97109023078187)', 'geojson': '{"type": "Point", "coordinates": [35.5651302471697, 48.97109023078187]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 35.567776091067664, 'y': 48.970720294945444, 'node_id': 22513000310641, 'node_len': 215.87588439057072, 'reach_id': 22513000311, 'wse': 86.0999984741211, 'wse_var': 0.0, 'width': 48.0, 'width_var': 176.20408163265307, 'facc': 29929.464628526417, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 768774.0, 'lakeflag': 0, 'max_width': 48.0, 'manual_add': 0, 'meand_len': 2024.5565293456825, 'sinuosity': 1.002082963392283, 'type': 1, 'river_name': 'Oril', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (35.56777609106766 48.97072029494544)', 'geojson': '{"type": "Point", "coordinates": [35.56777609106766, 48.97072029494544]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'} ... displaying 10 of 23441 total bound parameter sets ... {'x': 26.897974680058578, 'y': 51.90992157070326, 'node_id': 22547000040551, 'node_len': 230.39504611191114, 'reach_id': 22547000041, 'wse': 130.0, 'wse_var': 0.0, 'width': 42.0, 'width_var': 57.67346938775511, 'facc': 26814.25636950096, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 1634021.0, 'lakeflag': 0, 'max_width': 42.0, 'manual_add': 0, 'meand_len': 1254.8351291761303, 'sinuosity': 1.311497870025763, 'type': 1, 'river_name': 'Horyn', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (26.89797468005858 51.90992157070326)', 'geojson': '{"type": "Point", "coordinates": [26.89797468005858, 51.90992157070326]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'}, {'x': 26.89728513331836, 'y': 51.908487534480784, 'node_id': 22547000040561, 'node_len': 208.55913026018607, 'reach_id': 22547000041, 'wse': 130.0, 'wse_var': 0.0, 'width': 48.0, 'width_var': 41.04, 'facc': 26812.037506324563, 'n_chan_max': 1, 'n_chan_mod': 1, 'obstr_type': 0, 'grod_id': 0, 'hfalls_id': 0, 'dist_out': 1634230.0, 'lakeflag': 0, 'max_width': 48.0, 'manual_add': 0, 'meand_len': 1254.8351291761305, 'sinuosity': 1.311497870025764, 'type': 1, 'river_name': 'Horyn', 'edit_flag': 'NaN', 'trib_flag': 0, 'geometry': 'POINT (26.89728513331836 51.90848753448078)', 'geojson': '{"type": "Point", "coordinates": [26.89728513331836, 51.90848753448078]}', 'shp_origin': 'eu_sword_nodes_hb22_v16.shp', 'netcdf_origin': 'eu_apriori_rivers_v16.nc'})]
(Background on this error at: https://sqlalche.me/e/14/f405)
frankinspace commented 7 months ago

Data was able to load in UAT account after removing the 'inf' values

2024-04-15T02:29:02.462-07:00INFO:podaac.sworddb.setup_sword:Indexing tables
2024-04-15T02:39:19.935-07:00INFO:podaac.sworddb.setup_sword:Performing row count
2024-04-15T02:39:20.653-07:00INFO:podaac.sworddb.setup_sword:Reach row count: (241852,)
2024-04-15T02:39:47.746-07:00INFO:podaac.sworddb.setup_sword:Node row count: (10865963,)