openego / eGon-data

GNU Affero General Public License v3.0
10 stars 4 forks source link

electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings is failing in Oldenburg #838

Open AmeliaNadal opened 2 years ago

AmeliaNadal commented 2 years ago

The task electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings is failing in Oldenburg with the following error message:

*** Reading local file: /home/nada_am/AN_eGon/eGon_code/eGon-data-workflow/egondata_1/wd_1/airflow/logs/egon-data-processing-pipeline/electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings/2022-07-12T09:37:28.191106+00:00/1.log
[2022-07-12 14:01:42,855] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: egon-data-processing-pipeline.electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings 2022-07-12T09:37:28.191106+00:00 [queued]>
[2022-07-12 14:01:43,003] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: egon-data-processing-pipeline.electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings 2022-07-12T09:37:28.191106+00:00 [queued]>
[2022-07-12 14:01:43,003] {taskinstance.py:880} INFO - 
--------------------------------------------------------------------------------
[2022-07-12 14:01:43,003] {taskinstance.py:881} INFO - Starting attempt 1 of 1
[2022-07-12 14:01:43,003] {taskinstance.py:882} INFO - 
--------------------------------------------------------------------------------
[2022-07-12 14:01:43,059] {taskinstance.py:901} INFO - Executing <Task(Demand_Building_Assignment (versioned)): electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings> on 2022-07-12T09:37:28.191106+00:00
[2022-07-12 14:01:43,072] {standard_task_runner.py:54} INFO - Started process 6594 to run task
[2022-07-12 14:01:43,145] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'egon-data-processing-pipeline', 'electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings', '2022-07-12T09:37:28.191106+00:00', '--job_id', '5781', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/dags/pipeline.py', '--cfg_path', '/tmp/tmpr177cmyx']
[2022-07-12 14:01:43,145] {standard_task_runner.py:78} INFO - Job 5781: Subtask electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings
[2022-07-12 14:01:43,338] {logging_mixin.py:120} INFO - Running <TaskInstance: egon-data-processing-pipeline.electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings 2022-07-12T09:37:28.191106+00:00 [running]> on host torch.esy.ve.dlr
[2022-07-12 14:10:58,035] {taskinstance.py:1150} ERROR - Geometry SRID (0) does not match column SRID (3035)
CONTEXT:  COPY osm_buildings_synthetic, line 1, column geom_point: "POINT (4359438.08131129 3362427.089453413)"
Traceback (most recent call last):
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/home/nada_am/AN_eGon/eGon_code/eGon-data-workflow/egondata_1/eGon-data/src/egon/data/datasets/__init__.py", line 194, in skip_task
    result = super(type(task), task).execute(*xs, **ks)
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/home/nada_am/AN_eGon/eGon_code/eGon-data-workflow/egondata_1/eGon-data/src/egon/data/datasets/electricity_demand_timeseries/hh_buildings.py", line 788, in map_houseprofiles_to_buildings
    synthetic_buildings.to_postgis(
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/geopandas/geodataframe.py", line 1808, in to_postgis
    geopandas.io.sql._write_postgis(
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/geopandas/io/sql.py", line 431, in _write_postgis
    gdf.to_sql(
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/pandas/core/generic.py", line 2872, in to_sql
    sql.to_sql(
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/pandas/io/sql.py", line 717, in to_sql
    pandas_sql.to_sql(
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/pandas/io/sql.py", line 1761, in to_sql
    sql_engine.insert_records(
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/pandas/io/sql.py", line 1340, in insert_records
    table.insert(chunksize=chunksize, method=method)
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/pandas/io/sql.py", line 967, in insert
    exec_insert(conn, keys, chunk_iter)
  File "/home/nada_am/anaconda3/envs/egondata_env1/lib/python3.8/site-packages/geopandas/io/sql.py", line 312, in _psql_insert_copy
    cur.copy_expert(sql=sql, file=s_buf)
psycopg2.errors.InvalidParameterValue: Geometry SRID (0) does not match column SRID (3035)
CONTEXT:  COPY osm_buildings_synthetic, line 1, column geom_point: "POINT (4359438.08131129 3362427.089453413)"

[2022-07-12 14:10:58,090] {taskinstance.py:1187} INFO - Marking task as FAILED. dag_id=egon-data-processing-pipeline, task_id=electricity_demand_timeseries.hh_buildings.map-houseprofiles-to-buildings, execution_date=20220712T093728, start_date=20220712T120142, end_date=20220712T121058
[2022-07-12 14:11:01,860] {local_task_job.py:102} INFO - Task exited with return code 1

There is probably a problem in the default setting of the SRID when the code in run without docker container, which could also be the source of the issue https://github.com/openego/eGon-data/issues/488.

AmeliaNadal commented 2 years ago

Because of the two new dependencies of the task PypsaEurSec:

https://github.com/openego/eGon-data/blob/8c1355bf03079b1471a9270823e2141342dc08d2/src/egon/data/airflow/dags/pipeline.py#L328-L329

we cannot test our (gas related) tasks, which make this issue kind of urgent.

nailend commented 2 years ago

Sorry, just saw this now. You probably tried to clear the task already? Sometimes the DB just bugs with the SRID.

The error message doesn't really make sense as the table is created at this point and the SRIDs are definied 3035 for both db-table as well as Geodataframe geom_point. Can't see the problem in the code

Found a similar problem here caused by different postGIS versions

AmeliaNadal commented 2 years ago

Sorry, just saw this now. You probably tried to clear the task already? Sometimes the DB just bugs with the SRID.

The error message doesn't really make sense as the table is created at this point and the SRIDs are definied 3035 for both db-table as well as Geodataframe geom_point. Can't see the problem in the code

Found a similar problem here caused by different postGIS versions

The issue occurs even with cleaning and on clean runs.

This is probably not a postGIS version problem because the versions we use (PostgreSQL 12.3 and POSTGIS=3.0.1) are newer as the one mentionned in the issue you've found. @gnn thinks more that there could be a problem in the code: data that do not have SRID. However, @nailend it will be difficult for you to solve this if you cannot reproduce the error.

AmeliaNadal commented 2 years ago

To prevent the same error to occur, we make use of the following SQL function

https://github.com/openego/eGon-data/blob/08962a73f84504c9d104752e7e8fe46d62fbcd5d/src/egon/data/datasets/gas_grid.py#L666

in order to integrate some gas related links.