Open andyfurniss4 opened 3 years ago
Sorry, incorrectly labelled as a bug.
Hello -
Short answer is "kind of"? This started as a research project and had goals to be something that was very dynamic and always being updated. As you have found, there aren't really many updates being performed, and this is mostly because of the time capacity I have to do the desk research, maintain information world wide, and incorporate small facts into the knowledge system.
The project has been primarily supported by grants with some cases of people sharing sources or updates. At the end of May we will be totally out of funding for anything related to this work (we spend time and energy on things that aren't just data updates). We will be putting out another data update at that time, but it's going to be fairly inadequate given all the changes to the energy system that are happening every day. There are some irons in the fire, but the earliest we could commit resources through our organization will probably be at the end of the year or early 2022. Any progress or work starting in June will be me volunteering or working in a hobby capacity. I'm not saying I won't be doing that, but it's likely to be even more sporadic.
Unfortunately the architecture that was developed early in the project has proven to be a bad choice and is really limiting the ability to get information into the database. The choice to not use a relational database or some managed database has resulted in some major fragility and loads of technical debt. Any change or update can actually be quite burdensome and is usually just some hacky patch to keep the status quo but add on some new mis-shapen subprocess. In many cases (for the 'automatic data sources') the way the unique plant IDs were defined permits them to change at the whim of the underlying dataset. This was fine during the initial development of the database when everything was in flux, but it's now a nagging burden to ensure that the plants keep the same ID over time.
There will be another update coming this month (end of May 2021), which will likely be the last for the foreseeable future. At that time we will update the readme with the status of the project. I plan on writing and sharing some sort of postmortem or lessons learned document - that's unlikely to be ready by end of May, but maybe June or July. I still completely believe in the mission and goals of this work, the database is widely used and appreciated, but our transition from essentially a prototype to something that was supposed to be production ready failed. We didn't make the jump successfully, we've just continued with the decisions from the prototype and are suffering the consequences.
Hi Logan,
Firstly, thank you very much for taking the time to write such a detailed response - I very much appreciate it.
Whilst it is a shame to hear the the project is effectively coming to a close (for now at least), I do understand the position you're in. I can also understand than the amount of work to maintain such a database must be huge with all the different sources, formats and languages involved. I can see how not using a relational database for this kind of project may have become a major problem as more and more data and sources are introduced. Perhaps that will need a rework with any future work that happens on the project. I will keep my fingers crossed for future funding/dedicated resource on this.
I do think that this is an incredibly valuable project as I'm not aware of such an extensive, centralised source of this data. You've done a great job to get to the stage you have with it, even if you may have done things differently if you were given the opportunity. With the world in the state it's in, I don't think it's possible to overstate the importance of being able to take a global view of how well we are/aren't doing in terms of our sustainability. Am I right in assuming that there isn't anything comparable to this project that you might recommend as an alternative for now?
I will keep my eyes open for the update at the end of May and the postmortem details and if you need volunteers in future then I'd be keen to help out in any way I may be able to do so. I am primarily a .NET developer but I have a bit of experience with Python and I've used various relation database technologies if you decided to go down that route in a future iteration.
Thanks again.
Hello Logan,
I am glad to hear you intend to publish the lessons learned. As well as the decisions which turned out to be wrong there is obviously a lot which went right. Hopefully the document and your achievements here will be useful to other projects such as Wikidata climate change and Climate Trace. Thanks for all your hard work so far and good luck.
Will you update this repository with the version 1.3?
Hi Matteo - yes this will be updated with version 1.3 in the next few days. The challenge has been documenting the processes and flow between this repo and our separate generation estimation repo.
The flow of information is kind of wonky and very stateful. This repo builds the core set of information, then it gets passed over to another set of scripts/models to either estimate plant generation or pull from known values (since estimating some types of plants can be very slow). Then the core "observations" and estimated generation are merged, which constitutes the "final" database. This database then needs to be copied back into this repository. None of this is really automated and it partially breaks some of the internal replicability that we have had to date...
Yes, soon!
[World Resources Institute | 4 returns] Johannes Friedrich Senior Associate and Manager of Climate Watchhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.climatewatchdata.org%2F&data=04%7C01%7CJohannes.Friedrich%40wri.org%7C0d683ed584f443ef10e708d8a35fac17%7C476bac1f36b24ad98699cda6bad1f862%7C0%7C0%7C637438978598233772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9qMK1mmPVZiP6avqYXrGjEjxIIR%2BqcqPy2JvSRZS384%3D&reserved=0 World Resources Institute
WRI is a global research organization that turns big ideas into action at the nexus of environment, economic opportunity and human well-being. Africa | Brazil | China | Europe | India | Indonesia | Mexico | United States
From: Matteo De Felice @.> Sent: Wednesday, June 30, 2021 9:44 AM To: wri/global-power-plant-database @.> Cc: Subscribed @.***> Subject: Re: [wri/global-power-plant-database] Is this dead? (#29)
Will you update this repository with the version 1.3?
- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwri%2Fglobal-power-plant-database%2Fissues%2F29%23issuecomment-871416989&data=04%7C01%7CJohannes.Friedrich%40wri.org%7C18cd14f84b424eda731408d93bcd2372%7C476bac1f36b24ad98699cda6bad1f862%7C0%7C0%7C637606574552387916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9dw9vjMHe0di23vmwpGsRZMcw3o26Z1NkkQo57u1f7Q%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABG43REPWAW2WTKEJSDUW3TTVMNSTANCNFSM436W647A&data=04%7C01%7CJohannes.Friedrich%40wri.org%7C18cd14f84b424eda731408d93bcd2372%7C476bac1f36b24ad98699cda6bad1f862%7C0%7C0%7C637606574552397914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QnU7FZA7UyYz5tDyx4ZnxfiD8f45shGQn27X2N%2FL5CQ%3D&reserved=0.
So...version 1.3 was indeed released: https://wri-dataportal-prod.s3.amazonaws.com/manual/global_power_plant_database_v_1_3.zip
Context: https://datasets.wri.org/dataset/globalpowerplantdatabase
However, neither the readme (which lists 1.2.0 as the latest) nor the releases (which lists 1.1.0 as the latest) mention 1.3.0. So...the good news is that there is a ton of great new data in the June 2021 release. It's just invisible if you've been following github.
If someone wanted to try and revive this project, how would they go about doing that?
I noticed a decent amount of the source datasets links are broken.
Hi @aaronclong - I would think of this less as a software development project and more as a research project. While many of the links are dead, the content may still live somewhere online. And there should be newer versions/releases of annual data tables from certain countries that could be included with pretty low effort, but that's not really where the value is...
For the other 100+ countries it is going to require a good amount of desk and online research. That's how we constructed it originally and I don't think there are many shortcuts. You can get an overview of our methods and prioritization in our technical note, but typically it looks like parsing through web pages, PDFs, and other files of big energy companies or ministries operating within a certain geography.... and then once those are exhausted start looking through the smaller organizations... and then press releases or industry news... Maybe this something LLMs could support: entity-relationship extraction from free-form text.
Depending on the longevity and traceability goals it might be appropriate to attempt to integrate and unify data from some second/third-party sources that are well understood within certain geographies. The JRC-OPEN-PPDB covering Europe comes to mind. As does CarbonTrace which has a larger geographic reach but I think covers fewer fuel types. I have also seen a number of "automatic" solar PV mapping methods that are producing great outputs but typically within a single country (India and China in particular). There's also the "Public Utility Data Liberation (PUDL)" project which greatly increases the temporal resolution of generation information, but only for the USA. And the group of researchers in Europe surrounding OpenMOD, Open Power Systems, Open Energy Platform seemed to be leading in the space of semantics/configuration definition last I was looking.
Blessings on whoever can bring all of this together into a common frame and do the entity resolution needed to "make it simple". And well-wishes for those who want to analyze or compare across conditions of varying data quality.
All this to say that any group seeking to pick up this torch might be well served in going back to some first principles about what data they store, how they process/prepare that data once stored, and what shape the data should take in its most presentable/usable form.
Is this database still being maintained and updated? It looks like there hasn't been any changes to this repo for over a year now and the last data release was over two years ago.