openego / powerd-data

GNU Affero General Public License v3.0
1 stars 0 forks source link

pypsaeursec.neighbor-reduction is failing #140

Closed CarlosEpia closed 11 months ago

CarlosEpia commented 11 months ago

The task "pypsaeursec.neighbor-reduction" is failing with the next error message:

Found local files: /home/powerd/clean-run-powerd-data/airflow/logs/dag_id=powerd-status-quo-processing-pipeline/run_id=manual2023-10-20T13:04:09.096605+00:00/task_id=pypsaeursec.neighbor-reduction/attempt=1.log [2023-10-24, 16:46:45 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: powerd-status-quo-processing-pipeline.pypsaeursec.neighbor-reduction manual2023-10-20T13:04:09.096605+00:00 [queued]> [2023-10-24, 16:46:45 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: powerd-status-quo-processing-pipeline.pypsaeursec.neighbor-reduction manual2023-10-20T13:04:09.096605+00:00 [queued]> [2023-10-24, 16:46:45 UTC] {taskinstance.py:1308} INFO - Starting attempt 1 of 1 [2023-10-24, 16:46:46 UTC] {taskinstance.py:1327} INFO - Executing <Task(PypsaEurSec (versioned)): pypsaeursec.neighbor-reduction> on 2023-10-20 13:04:09.096605+00:00 [2023-10-24, 16:46:46 UTC] {standard_task_runner.py:57} INFO - Started process 2066651 to run task [2023-10-24, 16:46:46 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'powerd-status-quo-processing-pipeline', 'pypsaeursec.neighbor-reduction', 'manual2023-10-20T13:04:09.096605+00:00', '--job-id', '160', '--raw', '--subdir', 'DAGS_FOLDER/dags/pipeline_status_quo.py', '--cfg-path', '/tmp/tmpco3c787k'] [2023-10-24, 16:46:46 UTC] {standard_task_runner.py:85} INFO - Job 160: Subtask pypsaeursec.neighbor-reduction [2023-10-24, 16:46:46 UTC] {task_command.py:410} INFO - Running <TaskInstance: powerd-status-quo-processing-pipeline.pypsaeursec.neighbor-reduction manual__2023-10-20T13:04:09.096605+00:00 [running]> on host at32 [2023-10-24, 16:46:47 UTC] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_EMAIL='clara.buettner@hs-flensburg.de' AIRFLOW_CTX_DAG_OWNER='airflowstatsd_on = False' AIRFLOW_CTX_DAG_ID='powerd-status-quo-processing-pipeline' AIRFLOW_CTX_TASK_ID='pypsaeursec.neighbor-reduction' AIRFLOW_CTX_EXECUTION_DATE='2023-10-20T13:04:09.096605+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual2023-10-20T13:04:09.096605+00:00' [2023-10-24, 16:46:47 UTC] {taskinstance.py:1824} ERROR - Task failed with exception Traceback (most recent call last): File "/home/powerd/clean-run-powerd-data/powerd-data/src/egon/data/datasets/init__.py", line 204, in skip_task result = super(type(task), task).execute(xs, *ks) File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/airflow/operators/python.py", line 181, in execute return_value = self.execute_callable() File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/airflow/operators/python.py", line 198, in execute_callable return self.python_callable(self.op_args, **self.op_kwargs) File "/home/powerd/clean-run-powerd-data/powerd-data/src/egon/data/datasets/pypsaeursec/init.py", line 267, in neighbor_reduction network = read_network() File "/home/powerd/clean-run-powerd-data/powerd-data/src/egon/data/datasets/pypsaeursec/init.py", line 179, in read_network return pypsa.Network(str(target_file)) File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pypsa/components.py", line 276, in init self._build_dataframes() File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pypsa/components.py", line 302, in _build_dataframes df = pd.DataFrame({k: pd.Series(dtype=d) for k, d in static_dtypes.iteritems()}, File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in getattr return object.getattribute(self, name) AttributeError: 'Series' object has no attribute 'iteritems' [2023-10-24, 16:46:47 UTC] {taskinstance.py:1345} INFO - Marking task as FAILED. dag_id=powerd-status-quo-processing-pipeline, task_id=pypsaeursec.neighbor-reduction, execution_date=20231020T130409, start_date=20231024T164645, end_date=20231024T164647 [2023-10-24, 16:46:47 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 160 for task pypsaeursec.neighbor-reduction ('Series' object has no attribute 'iteritems'; 2066651) [2023-10-24, 16:46:48 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 1 [2023-10-24, 16:46:48 UTC] {taskinstance.py:2653} INFO - 0 downstream tasks scheduled from follow-on schedule check

CarlosEpia commented 11 months ago

We have this error because the function read_network() is not working with Pandas 2.0.3.

CarlosEpia commented 11 months ago

After downgrading the version of Pandas to 1.3.5 it works (with some deprecation warnings). But the task fails because of some column names. All the column names were changed in https://github.com/openego/powerd-data/commit/d6ad5a1706fd355525aefcdb784c02bbc52ea061. @ClaraBuettner do you remember why these names were changed?

CarlosEpia commented 11 months ago

After adapting provisionally the column names in 1990353973407fa926762d76ad02a2545a468ca1 and f7a3db1855b1867f704f8ced41927ecde2259ccc, the task electrical_neighbours.grid failed with this error message: [2023-10-29, 12:57:14 UTC] {taskinstance.py:1824} ERROR - Task failed with exception Traceback (most recent call last): File "/home/powerd/clean-run-powerd-data/powerd-data/src/egon/data/datasets/init.py", line 204, in skip_task result = super(type(task), task).execute(*xs, *ks) File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/airflow/operators/python.py", line 181, in execute return_value = self.execute_callable() File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/airflow/operators/python.py", line 198, in execute_callable return self.python_callable(self.op_args, **self.op_kwargs) File "/home/powerd/clean-run-powerd-data/powerd-data/src/egon/data/datasets/electrical_neighbours.py", line 693, in grid foreign_lines = cross_border_lines( File "/home/powerd/clean-run-powerd-data/powerd-data/src/egon/data/datasets/electrical_neighbours.py", line 402, in cross_border_lines new_lines.to_postgis( File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/geopandas/geodataframe.py", line 2052, in to_postgis geopandas.io.sql._write_postgis( File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/geopandas/io/sql.py", line 461, in _write_postgis gdf.to_sql( File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pandas/core/generic.py", line 2878, in to_sql return sql.to_sql( File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pandas/io/sql.py", line 769, in to_sql return pandas_sql.to_sql( File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pandas/io/sql.py", line 1920, in to_sql total_inserted = sql_engine.insert_records( File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pandas/io/sql.py", line 1461, in insert_records return table.insert(chunksize=chunksize, method=method) File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/pandas/io/sql.py", line 1023, in insert num_inserted = exec_insert(conn, keys, chunk_iter) File "/home/powerd/clean-run-powerd-data/venv262/lib/python3.8/site-packages/geopandas/io/sql.py", line 348, in _psql_insert_copy cur.copy_expert(sql=sql, file=s_buf) psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "inf" CONTEXT: COPY egon_etrago_line, line 64, column b: "inf" [2023-10-29, 12:57:14 UTC] {taskinstance.py:1345} INFO - Marking task as FAILED. dag_id=powerd-status-quo-processing-pipeline, task_id=electrical_neighbours.grid, execution_date=20231020T130409, start_date=20231029T125709, end_date=20231029T125714 [2023-10-29, 12:57:14 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 182 for task electrical_neighbours.grid (invalid input syntax for type numeric: "inf" CONTEXT: COPY egon_etrago_line, line 64, column b: "inf" ; 3464456) [2023-10-29, 12:57:14 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 1 [2023-10-29, 12:57:14 UTC] {taskinstance.py:2653} INFO - 0 downstream tasks scheduled from follow-on schedule check

CarlosEpia commented 11 months ago

The script neighbor-reduction does not include a command to delete previous results. Since this function was executed partially several times, there are some elements duplicated.

ClaraBuettner commented 11 months ago

After downgrading the version of Pandas to 1.3.5 it works (with some deprecation warnings). But the task fails because of some column names. All the column names were changed in d6ad5a1. @ClaraBuettner do you remember why these names were changed?

I can't remember why I exactly did that commit, it is too long ago. The commit message says that it was needed because of a new pypsa version, but that is not changed in the setup.py which is strange. In my local environment I also use pypsa 0.17.1. Since airflow needed a pandas version > 2 , I would not downgrade that.

ClaraBuettner commented 11 months ago

The script neighbor-reduction does not include a command to delete previous results. Since this function was executed partially several times, there are some elements duplicated.

There is another task called clean_database that deleted the previous data.

ClaraBuettner commented 11 months ago

After downgrading the version of Pandas to 1.3.5 it works (with some deprecation warnings). But the task fails because of some column names. All the column names were changed in d6ad5a1. @ClaraBuettner do you remember why these names were changed?

I can't remember why I exactly did that commit, it is too long ago. The commit message says that it was needed because of a new pypsa version, but that is not changed in the setup.py which is strange. In my local environment I also use pypsa 0.17.1. Since airflow needed a pandas version > 2 , I would not downgrade that.

Okay, I did some local checks. Upgrading to pypsa 0.20.1 solved the problem. The update the pandas > 2 was needed and should not be reverted. Since pandas > 2 only works with pypsa > 0.20, this package was also updated. Could you try it out on the server?

CarlosEpia commented 11 months ago

Using Pypsa 0.20.1 solves the problem. Should I change the version in setup.py like a hotfix?

ClaraBuettner commented 11 months ago

I can also review your PR

CarlosEpia commented 11 months ago

Ok. I also included a new lower version for Pandas.