This project is not currently maintained by WRI. There are no planned updates as of this time (early 2022). The last version of this database is version 1.3.0. If we learn of active forks or maintained versions of the code and database we will attempt to provide links in the future.
This project aims to build an open database of all the power plants in the world. It is the result of a large collaboration involving many partners, coordinated by the World Resources Institute and Google Earth Outreach. If you would like to get involved, please email the team or fork the repo and code! To learn more about how to contribute to this repository, read the CONTRIBUTING
document.
The latest database release (v1.3.0) is available in CSV format here under a Creative Commons-Attribution 4.0 (CC BY 4.0) license. A bleeding-edge version is in the output_database
directory of this repo.
All Python source code is available under a MIT license.
This work is made possible and supported by Google, among other organizations.
The Global Power Plant Database is built in several steps.
build_databases
directory.raw_source_files/WRI
and processed with the build_database_WRI.py
script in the build_database
directory. Throughout the processing, we represent power plants as instances of the PowerPlant
class, defined in powerplant_database.py
. The final database is in a flat-file CSV format.
The database includes the following indicators:
We will expand this list in the future as we extend the database.
We define the "Fuel Type" attribute of our database based on common fuel categories. In order to parse the different fuel types used in our various data sources, we map fuel name synonyms to our fuel categories here. We plan to expand the database in the future to report more disaggregated fuel types.
A major challenge for this project is that data come from a variety of sources, including government ministries, utility companies, equipment manufacturers, crowd-sourced databases, financial reports, and more. The reliability of the data varies, and in many cases there are conflicting values for the same attribute of the same power plant from different data sources. To handle this, we match and de-duplicate records and then develop rules for which data sources to report for each indicator. We provide a clear data lineage for each datum in the database. We plan to ultimately allow users to choose alternative rules for which data sources to draw on.
To the maximum extent possible, we read data automatically from trusted sources, and integrate it into the database. Our current strategy involves these steps:
A table describing the data source(s) for each country is listed below.
Finally, we are examining ways to automatically incorporate data from the following supra-national data sources:
We assign a unique ID to each line of data that we read from each source. In some cases, these represent plant-level data, while in other cases they represent unit-level data. In the case of unit-level data, we commonly perform an aggregation step and assign a new, unique plant-level ID to the result. For plants drawn from machine-readable national data sources, the reference ID is formed by a three-letter country code ISO 3166-1 alpha-3 and a seven-digit number. For plants drawn from other database (including the manually-maintained dataset by WRI), the reference ID is formed by a variable-size prefix code and a seven-digit number.
In many cases our data sources do not include power plant geolocation information. To address this, we attempt to match these plants with the GEO and CARMA databases, in order to use that geolocation data. We use an elastic search matching technique developed by Enipedia to perform the matching based on plant name, country, capacity, location, with confirmed matches stored in a concordance file. This matching procedure is complex and the algorithm we employ can sometimes wrongly match two power plants or fail to match two entries for the same power plant. We are investigating using the Duke framework for matching, which allows us to do the matching offline.
The build system is as follows
requirements.txt
cd
into build_databases/
build_database_*.py
file for each data source or processing method that changed (when making a database update)build_global_power_plant_database.py
which reads from the pickled store/sub-databases.cd
into ../utils
database_country_summary.py
to produce summary tablecd
into ../output_database
global_power_plant_database.csv
to the gppd-ai4earth-api
repository. Look a the Makefile
in that repo to understand where it should be locatedmake_gppd.py
script in gppd-ai4earth-api
to construct a new version of the database with the full estimation dataDATABASE_VERSION
file, commit, etc...