Fix positive longitudes and erroneous time zones of plants, add missing coordinates and location (state, county and city) of plants and save the plant static attributes to the open_grid_emissions_data/results/plant_data folder. Closes CAR-4339, CAR-4340, CAR-4341.
What the code is doing
Use raw EIA-860 file to fix positive longitude and use the timezonefinder package to derive the timezone after fixing the coordinates
Use geopy package to get latitude and longitude of plant from its location (state, county and city) and vice versa
Testing
Successfully ran the 2005 pipeline
Where to look
The oge.helpers module has most of the changes
Usage Example/Visuals
Comparing 2005 plant static attributes files before changes and after.
The code snippet shows that erroneous longitude and timezone are fixed
>>> psa_old = pd.read_csv("~/Desktop/2005_outputs/plant_static_attributes_2005.csv", usecols=["plant_id_eia", "latitude", "longitude", "state", "county", "city", "timezone"], index_col=0)
>>> psa_new = pd.read_csv("~/open_grid_emissions_data/results/2005/plant_data/plant_static_attributes.csv", usecols=["plant_id_eia", "latitude", "longitude", "state", "county", "city", "timezone"], index_col=0)
>>> psa_old[psa_old["longitude"] > 0]
state county city latitude longitude timezone
plant_id_eia
50060 MD Prince Georges Upper Marlboro 38.824863 76.772200 Asia/Urumqi
54537 WA Whatcom Ferndale 48.828996 122.686666 America/Los_Angeles
55964 MD Prince Georges Upper Marlboro 38.847585 76.776400 Asia/Urumqi
>>> psa_new.loc[psa_old[psa_old["longitude"] > 0].index]
state county city latitude longitude timezone
plant_id_eia
50060 MD Prince Georges Upper Marlboro 38.824863 -76.773149 America/New_York
54537 WA Whatcom Ferndale 48.828996 -122.685114 America/Los_Angeles
55964 MD Prince Georges Upper Marlboro 38.847585 -76.785912 America/New_York
The code snippet below shows that most of the missing coordinates and location have been fille out:
>>> psa_old.isna().sum()
state 1
county 63
city 12
latitude 46
longitude 30
timezone 0
dtype: int64
>>> psa_new.isna().sum()
state 0
county 4
city 12
latitude 1
longitude 1
timezone 0
dtype: int64
>>>
Remaining missing information are:
>>> psa_new[psa_new["county"].isna() | psa_new["city"].isna() | psa_new["latitude"].isna() | psa_new["longitude"].isna()]
state county city latitude longitude timezone
plant_id_eia
414 CA Tuolumne NaN 38.202569 -120.077000 America/Los_Angeles
415 CA Tuolumne NaN 38.246656 -120.034100 America/Los_Angeles
603 DC NaN Washington 38.899400 -76.959200 America/New_York
3520 TX Pecos NaN 30.683611 -102.802800 America/Chicago
7253 SC NOT IN FILE unsited NaN NaN America/New_York
7338 CA Plumas NaN 39.889287 -121.279200 America/Los_Angeles
10125 NY NaN Jamaica 40.702913 -73.800643 America/New_York
10159 MI Allegheny NaN 40.512778 -79.800830 America/New_York
10377 VA NaN Hopewell 37.293900 -77.269700 America/New_York
10458 CA Lassen NaN 40.976389 -121.255800 America/Los_Angeles
13213 MS Union NaN 34.541100 -88.942200 America/Chicago
50242 GA Newton NaN 33.570092 -83.893920 America/New_York
54088 NY Saratoga NaN 43.250000 -73.814400 America/New_York
54650 CA Riverside NaN 33.721999 -116.037247 America/Los_Angeles
54767 NY NaN Brooklyn 40.670556 -73.936390 America/New_York
54934 PA Lackawanna NaN 41.436308 -75.589700 America/New_York
55316 IL Logan NaN 40.079661 -89.433729 America/Chicago
Review estimate
15min
Future work
N/A
Checklist
[x] Update the documentation to reflect changes made in this PR
[x] Format all updated python files using black
[x] Clear outputs from all notebooks modified
[x] Add docstrings and type hints to any new functions created
Purpose
Fix positive longitudes and erroneous time zones of plants, add missing coordinates and location (state, county and city) of plants and save the plant static attributes to the open_grid_emissions_data/results/plant_data folder. Closes CAR-4339, CAR-4340, CAR-4341.
What the code is doing
timezonefinder
package to derive the timezone after fixing the coordinatesgeopy
package to get latitude and longitude of plant from its location (state, county and city) and vice versaTesting
Successfully ran the 2005 pipeline
Where to look
The
oge.helpers
module has most of the changesUsage Example/Visuals
Comparing 2005 plant static attributes files before changes and after.
The code snippet shows that erroneous longitude and timezone are fixed
The code snippet below shows that most of the missing coordinates and location have been fille out:
Remaining missing information are:
Review estimate
15min
Future work
N/A
Checklist
black