os-climate / osc-ingest-esg-spreadsheets

Ingest Shell-reported ESG Data (2020 GHG and Energy Data)
Apache License 2.0
0 stars 3 forks source link

Hello, world! #1

Open MichaelTiemannOSC opened 2 years ago

MichaelTiemannOSC commented 2 years ago

Just a note saying that we now have two notebooks in this one pipeline. One processes a Shell report from 2020 (over 600 rows of data) and the other a DPDHL report from 2020 (almost 300 rows of data).

The DPDHL script is very much a hacked version of the script processing Shell's data. The next step is to port functionality back from DPDHL to Shell so that Shell becomes a unified script handling two similarly-shaped reports. If that proves reasonably feasible, we should create a new generic ingestion pipeline with a name relevant to the generic shapes it's prepared to handle. With GitHub branches, new report variations of that fundamental shape can be addressed, then merged back in. With luck, we can create a generic script that can process hundreds if not thousands of spreadsheets with minimal effort (and CPU overhead).

Here is the shape of the tidy data it produces:

Variable Notes Category Segmentation Unit Year Value

Variable = the specific datapoint being observed
Notes = row-speciifc note about the observation
Category = Top-level grouping, such as "Emissions" or "Energy". In the case of DPDHL we preserve Category:Subcategory:SubSubCategory as concatenated text in the Category, when appropriate.
Segmentation = For a category that can be sliced in various ways, a description of the slicing (e.g. 'by country', 'by fuel source', 'by business')
Unit = the unit of measurement (some work needed for percentage and so-called pure numbers that are really "number of buildings" or some such)
Year = the year of the measurement
Value = the measured value

@erikerlandson @caldeirav @MichaelClifford @oindrillac @ChristianMeyndt @idemir-ids @HeatherAck

@hbaltzell if you could find the next 5-10 spreadsheets shaped like

https://reports.shell.com/sustainability-report/2020/our-performance-data/greenhouse-gas-and-energy-data.html

https://reporting-hub.dpdhl.com/downloads/2020/4/DPDHL-ESG-Statbook-2020-en.xls

MichaelTiemannOSC commented 2 years ago

Added @Shreyanand

hbaltzell commented 2 years ago

Michael, great to see progress on this.

Do you think we could put someone on the case of hunting the web (with code) to find more spreadsheets with this type of data? Maybe Mike Platt? Is he still involved? Are you able to see whether there are these types of spreadsheets in the S&P repository?

MichaelTiemannOSC commented 2 years ago

Just for fun, I added a 3rd script to handle 10 years of Unilever's Emissions/Energy data (162 rows):

https://www.unilever.com/planet-and-society/sustainability-reporting-centre/sustainability-performance-data/

hbaltzell commented 2 years ago

Michael, I had assembles about 15 examples of corproate spreadsheets in a folder called "Corp ESG spreadsheets". I can't find it on the Google drive. I would be glad to upload again if you point me to the location. In this I also had a sample spreadsheet for CDP, GRI, and EEI-AGA. The latter is the Edison Electric Institutw and the American Gas Association. They created an ESG reporting template for their members that many of them use, and it is often posted on their websites, so if you created a script for that, you could get as many as 50 utilities. This would overlap with data that we already have from RMI, but it also has other ESG data. LMK what you would like me to do with these or we can have a call.

MichaelTiemannOSC commented 2 years ago

I did find that directory, and I've been using that to guide me to the new 2020 (and soon 2021) reports. Virtually all the companies I've looked at thus far have updated and improved their reporting, making it all more regular (and thus easier to parse for my purposes).

The name of the folder to which you refer is "Corporate ESG spreadsheets".

hbaltzell commented 2 years ago

Ok, I can hunt for more, but we should create a company list and set priorities for what sectors we want. Since we already have a lot of data on utilities, maybe we should look at the other priority sectors that the ITR team will focus on. Also, maybe there will be a way to check the results against the data vault.

Sent from my iPhone

On Nov 4, 2021, at 3:59 AM, Michael Tiemann @.***> wrote:



I did find that directory, and I've been using that to guide me to the new 2020 (and soon 2021) reports. Virtually all the companies I've looked at thus far have updated and improved their reporting, making it all more regular (and thus easier to parse for my purposes).

The name of the folder to which you refer is "Corporate ESG spreadsheets".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://url.emailprotection.link/?bL2vq9TdExtd-J9AixJMT1_tWcvNMFfqlXAwf4-5h5Y5bVNzkCeSE0WHVMy8NuPsIZ4HqvT2rWE6__e-w8MA3z0Zlwaa8KzzANq4aV6kfIvhujyFhlhBxq3ciwRxGPDax, or unsubscribehttps://url.emailprotection.link/?bw4co7j4yQROlz6getjSf_3eOhyMIFoQwxv9DEf5A0SzhsDwu7aZcfBTLtgEEc6zn3SvTOo9h9AeOTnDVpPdQwDGvjr0qXsNEfvrjqN4PtSoEYj7-nYOjMV09JjWhT-4m5ttAVJ2crjBXHpFA--jWVRJo6g3sXg8-EkVzv9ddi78~. Triage notifications on the go with GitHub Mobile for iOShttps://url.emailprotection.link/?bHbqr4pqZntaKmctrp3nVOA9lXKmqQfoGWfzOMULGKrBtc_m1m6jfJpADn0X7gphfZUEezL9ZM9xiu4Vc815JHSDmRbDcg5e8Gtjnjv_MMv2dkkZCs3w5CD8bw9bQuRVpolER3tVij4-Fd56vpTOj-2i0MRoHgINKwKGYyHg1A1Y~ or Androidhttps://url.emailprotection.link/?by9TaYIjr5apoEDqCac_SKInlXm8CaDG-lVLYwZI1k9g9rvYs8aghlhH01W8qz3J %20SSlMKMvIT28GJtdogGCaTsZ8n_Zzz9dzd1OGkkEsVp0iZQQEbYbAXrqGwAsnbU44KcVMFP_mztu_m2_OxiF6Jnvlh6LqzXflKgztNfWue6HtFldGdCO96P-A4WmkoQ3rLIUCI4LyQyF7P1YMkI5AqvA~~.

MichaelTiemannOSC commented 2 years ago

I have a slightly different agenda, which is to find corporate reports that are similarly shaped to Shell, DPDHL, Unilever, AEP, etc. The BHP Billiton report is an example of something that is not so similarly shaped (it mixes WIDE and LONG, making it more challenging to interpret). But if we can collect WIDE-form (dates in columns left to right) reports from major companies and we can build to a single script reading consistently from dozens of sources, we'd have something to say.

We could also start a front-end that handles LONG data. That was actually the first thing I tackled with Vale SE. If we can find 10-20 like that (but not BHP yet), that would be good, too.

Once we have a strong ingestion engine, we can go about developing a sector-based approach.

hbaltzell commented 2 years ago

OK, when I get some time I'll have a look around

hbaltzell commented 2 years ago

Michael, I’ve been hunting around for more spreadsheets but without much success. Lots of data tables in pdfs, of course, and I can share additional ones of these but I bet they’re already in the S&P repository. If you want to find more, other than my hope that someone on the team could figure out how to search corporate websites to find any xls or csv downloadable files (which we could then use a search routine on to see which have sustainability data), then one suggestion is to just put out a request to the ESG analysts among our members – both the asset managers and of course S&P and LSEG. Their analysts would probably know off the tops of their heads what companies publish data in these formats rather than pdf. You could even run some type of Google survey or other method to collect names, or even better, to have these analysts provide the urls or files. This would be a simple collaborative effort if we can’t find this type of data by machine.

MichaelTiemannOSC commented 2 years ago

Yes, I've found another half-dozen, but the pickings are silm. I have enough to keep me busy for the short term. Thanks!

HeatherAck commented 2 years ago

I found a few as well, see attached

entire_abb_csr18.xls bp-esg-datasheet-2020.xlsx daimler_sr_2019_kpis_environmental_protection.xls 210223-esg-datapack-2020-excel.xlsx sap-2020-5year-summary-and-chart-generator-data.xlsx shape_future_st_sr20.xls Data_Library_2016_2020.xlsx.xlsx

HeatherAck commented 2 years ago

in_depth_sustainability_reporting_castellum_ar19.xls