openego / eGon-data

GNU Affero General Public License v3.0
10 stars 4 forks source link

Coordination of server use #377

Open IlkaCu opened 3 years ago

IlkaCu commented 3 years ago

This issue is meant to coordinate the use of the egondata user/instance on our server in FL. We already agreed on starting a clean-run of the dev branch on every Friday. This will (most likely) make some debugging necessary on Mondays. To avoid conflicts while debugging, please comment in this issue before you start debugging and shortly note on which datasets/ parts of the workflow you will be working on.

fwitte commented 2 years ago

@AmeliaNadal: Any idea, why that task would fail suddenly?

IlkaCu commented 2 years ago

Unfortunately up to now three tasks in the current CI branch run failed:

* demandregio.insert-household-demand @ClaraBuettner

It seems that the Open Data Portal resp. the DemandRegio database is not accessible in the moment. I contacted the developers. I hope it will be available soon - if not, we need to find an interim solution to bypass the demandregio.insert-household-demand - task.

ClaraBuettner commented 2 years ago
* electrical_neighbours.grid @ClaraBuettner

This problem was fixed, I cleared the task and it was running successfully.

ClaraBuettner commented 2 years ago
* demandregio.insert-household-demand @ClaraBuettner

I manually inserted the data from demandregio and cleared the downstream tasks. But since gas_areas.create-voronoi failed, many other tasks could not start. @AmeliaNadal and @fwitte Do you already have an idea how to solve this?

fwitte commented 2 years ago

Some services seem to be unavailable currently leading to re_potential_areas.download-datasets and mastr.download-mastr-data not running.

[2021-12-17 14:32:02,538] {taskinstance.py:1150} ERROR - HTTP Error 503: Service Temporarily Unavailable
Traceback (most recent call last):
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/home/witt_fa/eGon/eGon-data/src/egon/data/datasets/__init__.py", line 195, in skip_task
    result = super(type(task), task).execute(*xs, **ks)
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/home/witt_fa/eGon/eGon-data/src/egon/data/datasets/mastr.py", line 47, in download_mastr_data
    urlretrieve(zenodo_files_url + filename, filename)
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/home/witt_fa/anaconda3/envs/egon_env/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Temporarily Unavailable

Seems like https://sandbox.zenodo.org/ is not running.

nesnoj commented 2 years ago

Seems like https://sandbox.zenodo.org/ is not running.

Thanks for reporting! Yea, that happens quite often :-/ I restarted both tasks..

ClaraBuettner commented 2 years ago

Seems like https://sandbox.zenodo.org/ is not running.

Thanks for reporting! Yea, that happens quite often :-/ I restarted both tasks..

Both tasks failed again. I could copy the files from another directory to test at least the upcoming tasks if you agree.

nesnoj commented 2 years ago

Both tasks failed again. I could copy the files from another directory to test at least the upcoming tasks if you agree.

Yes, that's fine as they didn't change. Thank you.

nesnoj commented 2 years ago

The task gas_grid.insert-gas-data failed in the CI run @AmeliaNadal @fwitte

AmeliaNadal commented 2 years ago

This is solved, sorry for blocking the workflow: a test couldn't be perfomed in normal mode and some cross-bording pipes had non-valid country buses.

fwitte commented 2 years ago

The tasks heat_etrago.supply and power_plants.assign_weather_data.weatherId-and-busId failed @CarlosEpia @ClaraBuettner. Both of the tasks have not been modified in about a month, so there might be some dependency issues.

ClaraBuettner commented 2 years ago

The tasks heat_etrago.supply and power_plants.assign_weather_data.weatherId-and-busId failed @CarlosEpia @ClaraBuettner. Both of the tasks have not been modified in about a month, so there might be some dependency issues.

Thanks for repoting this! Both problems are caused by new data for eGon100RE. I was able to fix heat_etrago.supply easily. But power_plants.assign_weather_data.weatherId-and-busId is a bit more complicated, I created a new issue #592.

IlkaCu commented 2 years ago

As a quick reminder: Taskpower_plants.assign_weather_data.weatherId-and-busId still fails on the CI-branch

fwitte commented 2 years ago

The run on the new CI branch went smoothly :), except

As a quick reminder: Taskpower_plants.assign_weather_data.weatherId-and-busId still fails on the CI-branch

is still an issue.

fwitte commented 2 years ago

The task

* electrical_neighbours.grid @ClaraBuettner

failed while the link geometry/topology is created: https://github.com/openego/eGon-data/blob/d59ae6ecf08689092953470b1d39a0a56532c8d5/src/egon/data/datasets/electrical_neighbours.py#L650.

ClaraBuettner commented 2 years ago

The task

* electrical_neighbours.grid @ClaraBuettner

failed while the link geometry/topology is created:

.

Thank you, I fixed it and cleared the task. But now gas_grid.insert-gas-data failed:

[2022-01-24 08:44:03,623] {taskinstance.py:1150} ERROR - "['v_mag_pu_set_fixed'] not found in axis"
Traceback (most recent call last):
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/home/egondata/git-repos/friday-evening-weekend-run/code/src/egon/data/datasets/__init__.py", line 195, in skip_task
    result = super(type(task), task).execute(*xs, **ks)
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/home/egondata/git-repos/friday-evening-weekend-run/code/src/egon/data/datasets/gas_grid.py", line 676, in insert_gas_data
    abroad_gas_nodes_list = insert_gas_buses_abroad()
  File "/home/egondata/git-repos/friday-evening-weekend-run/code/src/egon/data/datasets/gas_grid.py", line 254, in insert_gas_buses_abroad
    gdf_abroad_buses = gdf_abroad_buses.drop(
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/pandas/core/frame.py", line 4308, in drop
    return super().drop(
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/pandas/core/generic.py", line 4153, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/pandas/core/generic.py", line 4188, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5592, in drop
    raise KeyError(f"{labels[mask]} not found in axis")
KeyError: "['v_mag_pu_set_fixed'] not found in axis"

It looks like there is a conflict between #637 and #634. I would like to merge #634 soon, to avoid other issues like that. Could you solve this in your branch @AmeliaNadal?

fwitte commented 2 years ago

@ClaraBuettner, thank you for letting us know. I fixed it on the server and I already pushed it to #637. No need for your action @AmeliaNadal. Task was successful.

nailend commented 2 years ago

No run this weekend?

nesnoj commented 2 years ago

Dunno, I wrote a mail to @gnn , will see..

gnn commented 2 years ago

Sorry, totally forgot to start the run during the project meeting. Thanks for your mail, @nesnoj. Should be running now.

fwitte commented 2 years ago

The run currently stops at mastr download, service is not unavailable.

fwitte commented 2 years ago

Some instances (in electrical neighbours and h2_to_ch4) of "..._fixed" attributes were resurrected. For h2_to_ch4, it is fixed in https://github.com/openego/eGon-data/pull/654/commits/f13d958056d328eeae6a21e581684bfb2f69493c. For the electrical neighbours, I could not find where it originated... Any ideas? On top of that the opendata.ffe server was not available (still is not).

ClaraBuettner commented 2 years ago

Some instances (in electrical neighbours and h2_to_ch4) of "..._fixed" attributes were resurrected. For h2_to_ch4, it is fixed in f13d958. For the electrical neighbours, I could not find where it originated... Any ideas?

The "..._fixed" attributes were updated for the electrical_neigbours dataset in 40d275f2992b0bf26a466dc59d61160331d0b9a8 but this commit was not on CI-branch, I fixed this.

ClaraBuettner commented 2 years ago

On top of that the opendata.ffe server was not available (still is not).

I added the tables from demandregio manually and cleared all downstream tasks.

CarlosEpia commented 2 years ago

The next tasks failed:

ClaraBuettner commented 2 years ago

The task electricity_demand.get-annual-household-el-demand-cells failed:

  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/egondata/git-repos/friday-evening-weekend-run/environment/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DiskFull) could not resize shared memory segment "/PostgreSQL.784125929" to 16777216 bytes: No space left on device

[SQL: SELECT demand.egon_household_electricity_profile_of_buildings.id, demand.egon_household_electricity_profile_of_buildings.building_id, demand.egon_household_electricity_profile_of_buildings.cell_id, demand.egon_household_electricity_profile_of_buildings.profile_id, demand.egon_household_electricity_profile_in_census_cell.nuts3, demand.egon_household_electricity_profile_in_census_cell.factor_2035, demand.egon_household_electricity_profile_in_census_cell.factor_2050 
FROM demand.egon_household_electricity_profile_of_buildings, demand.egon_household_electricity_profile_in_census_cell 
WHERE demand.egon_household_electricity_profile_of_buildings.cell_id = demand.egon_household_electricity_profile_in_census_cell.cell_id ORDER BY demand.egon_household_electricity_profile_of_buildings.id]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

There is plenty of disk space left, so I don't understand why this happened.

nesnoj commented 2 years ago

I had this error as well on our server (despite of 16TB of free space), clearing the task did solve the issue. I thought it might be related to our machine but now I think this should be investigated. I can check what's wrong here. I restarted the task on the Flensburg server, now it's not immediately failing..

gnn commented 2 years ago

I had to do some checks before running the current weekend run. As a result I only managed to start it on Sunday morning at around half past midnight. Just a heads up so that there's no confusion.

fwitte commented 2 years ago

In the current CI there are two entries for the HeatDemandEurope dataset. On dev only one entry exists. Which one is correct?

dev + CI:

https://github.com/openego/eGon-data/blob/9c7f2393cce5ef8290d5766ef6a3624a6a954620/src/egon/data/airflow/dags/pipeline.py#L157-L159

only CI: https://github.com/openego/eGon-data/blob/91fdc7dd8d7854c4b58292d0dda15e857df08bbf/src/egon/data/airflow/dags/pipeline.py#L243-L247

There is one dependant (pypsa-eur-sec run), which only uses the first entry (dev + CI).

Edit: Remove the second (CI only) appearance for this weekend's run: 848d577bb2283d962019ea05087735ee0f04d953

fwitte commented 2 years ago

Apparently storage is full:

psycopg2.errors.DiskFull: could not resize shared memory segment "/PostgreSQL.2100397311" to 8388608 bytes: No space left on device
CONTEXT:  parallel worker
nesnoj commented 2 years ago

Apparently storage is full:

psycopg2.errors.DiskFull: could not resize shared memory segment "/PostgreSQL.2100397311" to 8388608 bytes: No space left on device
CONTEXT:  parallel worker

Checking df there is sufficient disk space available. In the past I encountered this error a couple of times as well and dunno why this happens. I got the impression that it's about temporary space in the docker, maybe @gnn has an idea?

I restarted the 2 tasks...

nesnoj commented 2 years ago

There's a major bug in the household demand #704. As we need the results for the paper we would like to restart the weekend run asap or after the run finished. Would that be ok for you @AmeliaNadal @IlkaCu @ClaraBuettner @fwitte @gnn @ulfmueller ?

PS: If you have objections, I'd prepare a full run on the RLI's workstation..

ClaraBuettner commented 2 years ago

There's a major bug in the household demand #704. As we need the results for the paper we would like to restart the weekend run asap or after the run finished. Would that be ok for you @AmeliaNadal @IlkaCu @ClaraBuettner @fwitte @gnn @ulfmueller ?

PS: If you have objections, I'd prepare a full run on the RLI's workstation..

The last weekend run had some problems downloading zensus data, so many tasks didn't run yet. It would be fine for me to stop this run and restart it with your changes.

AmeliaNadal commented 2 years ago

I just checked with @fwitte and it's ok for both of us to restart the weekendrun.

nailend commented 2 years ago

its not yet in the CI. Ran into some merge problems.

nesnoj commented 2 years ago

Thanks for you quick replies!! I'll take care of the restart..

nailend commented 2 years ago

Merge done, ready to run

nesnoj commented 2 years ago

Run has been started.. btw, industrial_gas_demand.download-industrial-gas-demand is immediately failing

fwitte commented 2 years ago

I might have overseen a dependency to the scenario_parameters.insert-scenarios task. I'll restart the download after that task was successful, and adjust the pipeline if necessary. Thank you for reporting!

Edit: It also seems fishiy, that the pypsa-technology-data download takes more than a couple of minutes...

gnn commented 2 years ago

Run has been started.. btw, industrial_gas_demand.download-industrial-gas-demand is immediately failing

Now that this task gets mentioned, I'd also like to point out that "pipeline.py" tries to add it to the workflow twice.

[..] overseen [..]

Overlooked, I presume. ;)

fwitte commented 2 years ago

Thanks for noticing @gnn!

Yesterday, I cleared the task scenario_parameters.download-pypsa-technology-data because it was running for > 5 hours and it then finished within 30 sec as it should and the industrial_gas_demand.download-industrial-gas-demand was successful after that. However, many other tasks are now on orange state and do not start, even though there is no failed task.

Edit: I just now restarted all taks which were on hold due to "upstream failed".

nesnoj commented 2 years ago

Now that this task gets mentioned, I'd also like to point out that "pipeline.py" tries to add it to the workflow twice.

Hmm, @nailend experienced similar oddities yesterday - we really should think about a new weekend v3 branch..

nesnoj commented 2 years ago

Several CHP extension tasks (BW, HH, HE) failed @ClaraBuettner ("sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: sorry, too many clients already"), is that a resources' problem? I cleared them..

nesnoj commented 2 years ago

All done

gnn commented 2 years ago

Several CHP extension tasks (BW, HH, HE) failed @ClaraBuettner ("sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: sorry, too many clients already"), is that a resources' problem? I cleared them..

@AmeliaNadal had a similar problem on the DLR's server. That usually means that PostgreSQL's connection limit as been reached. Maybe due to parallelism? I'll look into what the default connection limit is and whether it's possible to increase it.

Now that this task gets mentioned, I'd also like to point out that "pipeline.py" tries to add it to the workflow twice.

Hmm, @nailend experienced similar oddities yesterday - we really should think about a new weekend v3 branch..

That's a nice suggestion. Since I didn't rename the weekend v2 branch yet, we could create a new fork from dev under the original name. I asked in the last AP1 conference and the attendants where OK with this but noted that some PR's which are currently part of the v2 branch should be merged into dev before creating the new fork. In order to give everyone the chance to merge the stuff that's ready for dev, I'll wait with the fork until everyone gives his OK via the list below. So when you're done merging everything that's ready, please tick the box next to your name:

Once everyone gave his approval, I'll create a new fork from dev that will act as the new weekend-run branch. Please add anyone whom I forgot to mention (my apologies).

fwitte commented 2 years ago

@gnn FYI: Amélia was out of office yesterday and today.

gnn commented 2 years ago

@gnn FYI: Amélia was out of office yesterday and today.

Thanks. Amélia told us during the AP1 conference on Wednesday so I didn't expect the fork to be done before the weekend anyway. :) Also, I noted that I forgot Ulf so I added him to the list.

IlkaCu commented 2 years ago

Hello eGon-data people,

@gnn and me agreed on a time plan to create a new CI branch and do some clean-up. Here it is:

Until Wednesday evening: Merge tested and approved features to dev (all) Thursday morning: Clean pipeline.py on dev up (@IlkaCu) Thursday noontime: Review clean-up and create new CI-branch from dev (@gnn) Thursday afternoon - Friday afternoon: Merge new features into new CI-branch (all) Friday afternoon: Start weekend run (@gnn)

In case you have any questions concerning this time plan or remarks, please let us know.

gnn commented 2 years ago

The tasks

failed. I've linked their failure logs and restarted the tasks. The first one is a known failure which also occurs during the SH run and is probably due to a missing dependency.

fwitte commented 2 years ago

Hmm, I have never seen that specific issue with industrial_gas_demand.insert-industrial-gas-demand-egon2035, thank you for noticing it. After restarting it was successful. I will have a look if the data look fine before next Friday's run.

Edit: might be related to #514!